forked from osmocom/wireshark
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
968 lines
40 KiB
968 lines
40 KiB
This file is a HOWTO for Wireshark developers. It describes general development
|
|
and coding practices for contributing to Wireshark no matter which part of
|
|
Wireshark you want to work on.
|
|
|
|
To learn how to write a dissector, read this first, then read the file
|
|
README.dissector.
|
|
|
|
This file is compiled to give in depth information on Wireshark.
|
|
It is by no means all inclusive and complete. Please feel free to send
|
|
remarks and patches to the developer mailing list.
|
|
|
|
0. Prerequisites.
|
|
|
|
Before starting to develop a new dissector, a "running" Wireshark build
|
|
environment is required - there's no such thing as a standalone "dissector
|
|
build toolkit".
|
|
|
|
How to setup such an environment is platform dependent; detailed
|
|
information about these steps can be found in the "Developer's Guide"
|
|
(available from: https://www.wireshark.org) and in the INSTALL and
|
|
README.md files of the sources root dir.
|
|
|
|
0.1. General README files.
|
|
|
|
You'll find additional information in the following README files:
|
|
|
|
- README.capture - the capture engine internals
|
|
- README.design - Wireshark software design - incomplete
|
|
- README.developer - this file
|
|
- README.dissector - How to dissect a packet
|
|
- README.display_filter - Display Filter Engine
|
|
- README.idl2wrs - CORBA IDL converter
|
|
- README.packaging - how to distribute a software package containing WS
|
|
- README.regression - regression testing of WS and TS
|
|
- README.stats_tree - a tree statistics counting specific packets
|
|
- README.tapping - "tap" a dissector to get protocol specific events
|
|
- README.xml-output - how to work with the PDML exported output
|
|
- wiretap/README.developer - how to add additional capture file types to
|
|
Wiretap
|
|
|
|
0.2. Dissector related README files.
|
|
|
|
You'll find additional dissector related information in the file
|
|
README.dissector as well as the following README files:
|
|
|
|
- README.heuristic - what are heuristic dissectors and how to write them
|
|
- README.plugins - how to "pluginize" a dissector
|
|
- README.request_response_tracking - how to track req./resp. times and such
|
|
- README.wmem - how to obtain "memory leak free" memory
|
|
|
|
0.3 Contributors
|
|
|
|
James Coe <jammer[AT]cin.net>
|
|
Gilbert Ramirez <gram[AT]alumni.rice.edu>
|
|
Jeff Foster <jfoste[AT]woodward.com>
|
|
Olivier Abad <oabad[AT]cybercable.fr>
|
|
Laurent Deniel <laurent.deniel[AT]free.fr>
|
|
Gerald Combs <gerald[AT]wireshark.org>
|
|
Guy Harris <guy[AT]alum.mit.edu>
|
|
Ulf Lamping <ulf.lamping[AT]web.de>
|
|
|
|
1. Portability.
|
|
|
|
Wireshark runs on many platforms, and can be compiled with a number of
|
|
different compilers; here are some rules for writing code that will work
|
|
on multiple platforms.
|
|
|
|
Building Wireshark requires a compiler that supports C11. This includes
|
|
reasonably recent version of GCC and clang. Microsoft Visual Studio supports
|
|
C11 from Visual Studio 2019 version 16.8 and later. Support requires an updated
|
|
Universal C Runtime (UCRT) and Windows SDK version to work properly with the
|
|
conforming preprocessor. The minimum SDK version is 10.0.20348.0 (version 2104).
|
|
|
|
The C11 has some optional parts that are not a requirement to build Wireshark.
|
|
In particular the following optional C11 features must NOT be used:
|
|
- Variable length arrays
|
|
- Bounds-checking interfaces (Annex K)
|
|
|
|
We don't allow them because their value is questionable and requiring them
|
|
would exclude a lot of compilers and runtimes that we wish to support.
|
|
|
|
Don't initialize global or static variables (variables with static
|
|
storage duration) in their declaration with non-constant values. This is not
|
|
permitted in C. E.g., if "i" is a static or global
|
|
variable, don't declare "i" as
|
|
|
|
uint32_t i = somearray[2];
|
|
|
|
outside a function, or as
|
|
|
|
static uint32_t i = somearray[2];
|
|
|
|
inside or outside a function, declare it as just
|
|
|
|
uint32_t i;
|
|
|
|
or
|
|
|
|
static uint32_t i;
|
|
|
|
and later, in code, initialize it with
|
|
|
|
i = somearray[2];
|
|
|
|
instead. Initializations of variables with automatic storage duration -
|
|
i.e., local variables - with non-constant values is permitted, so,
|
|
within a function
|
|
|
|
uint32_t i = somearray[2];
|
|
|
|
is allowed.
|
|
|
|
Don't use zero-length arrays as structure members, use flexible array members
|
|
instead.
|
|
|
|
Don't use "uchar", "u_char", "ushort", "u_short", "uint", "u_int",
|
|
"ulong", "u_long" or "boolean"; they aren't defined on all platforms.
|
|
Use the fixed width integers provided in C since C99. These are defined
|
|
in <stdint.h>.
|
|
|
|
If you want an 8-bit unsigned quantity, use "uint8_t"; if you want an
|
|
8-bit character value with the 8th bit not interpreted as a sign bit,
|
|
use "unsigned char"; if you want a 16-bit unsigned quantity, use "uint16_t";
|
|
if you want a 32-bit unsigned quantity, use "uint32_t"; and if you want
|
|
an "int-sized" unsigned quantity, use "unsigned"; if you want a boolean,
|
|
use "bool" (defined in <stdbool.h>). You don't need to explicitly include
|
|
these headers; they are included in <wireshark.h>. Use that instead.
|
|
|
|
To print fixed width integers you must use the macros provided in <inttypes.h>.
|
|
|
|
uint32_t var;
|
|
printf("var = " PRIu32 "\n", var);
|
|
|
|
Don't use "long" to mean "signed 32-bit integer", and don't use
|
|
"unsigned long" to mean "unsigned 32-bit integer"; "long"s are 64 bits
|
|
long on many platforms. Use "gint32" for signed 32-bit integers and use
|
|
"guint32" for unsigned 32-bit integers.
|
|
|
|
Don't use "long" to mean "signed 64-bit integer" and don't use "unsigned
|
|
long" to mean "unsigned 64-bit integer"; "long"s are 32 bits long on
|
|
many other platforms. Don't use "long long" or "unsigned long long",
|
|
either, as not all platforms support them; use "int64_t" or "uint64_t",
|
|
which will be defined as the appropriate types for 64-bit signed and
|
|
unsigned integers.
|
|
|
|
On LLP64 data model systems (notably 64-bit Windows), "int" and "long"
|
|
are 32 bits while "size_t" and "ptrdiff_t" are 64 bits. This means that
|
|
the following will generate a compiler warning:
|
|
|
|
int i;
|
|
i = strlen("hello, sailor"); /* Compiler warning */
|
|
|
|
Normally, you'd just make "i" a size_t. However, many GLib and Wireshark
|
|
functions won't accept a size_t on LLP64:
|
|
|
|
size_t i;
|
|
char greeting[] = "hello, sailor";
|
|
guint byte_after_greet;
|
|
|
|
i = strlen(greeting);
|
|
byte_after_greet = tvb_get_guint8(tvb, i); /* Compiler warning */
|
|
|
|
Try to use the appropriate data type when you can. When you can't, you
|
|
will have to cast to a compatible data type, e.g.
|
|
|
|
size_t i;
|
|
char greeting[] = "hello, sailor";
|
|
uint8_t byte_after_greet;
|
|
|
|
i = strlen(greeting);
|
|
byte_after_greet = tvb_get_guint8(tvb, (int) i); /* OK */
|
|
|
|
or
|
|
|
|
int i;
|
|
char greeting[] = "hello, sailor";
|
|
uint8_t byte_after_greet;
|
|
|
|
i = (int) strlen(greeting);
|
|
byte_after_greet = tvb_get_guint8(tvb, i); /* OK */
|
|
|
|
See http://www.unix.org/version2/whatsnew/lp64_wp.html for more
|
|
information on the sizes of common types in different data models.
|
|
|
|
A lot of legacy code still uses GLib types and I/O replacement API. These
|
|
should be gradually transitioned to use the standard interfaces provided in
|
|
C11. Sometimes it may be necessary to use an unsavory cast or two or abuse
|
|
a macro to bridge the two codebases during the transition. Such is life,
|
|
use your judgement and do the best possible under the circumstances.
|
|
|
|
Avoid GLib synonyms like gchar and gint and especially don't
|
|
use gpointer and gconstpointer, unless you are writing GLib callbacks
|
|
and trying to match their signature exactly. These just obscure the
|
|
code and gconstpointer in particular is just semantically weird and poor style.
|
|
|
|
When printing or displaying the values of 64-bit integral data types,
|
|
don't use "%lld", "%llu", "%llx", or "%llo" - not all platforms
|
|
support "%ll" for printing 64-bit integral data types. Instead use
|
|
the macros in <inttypes.h>, for example:
|
|
|
|
proto_tree_add_uint64_format_value(tree, hf_uint64, tvb, offset, len,
|
|
val, "%" PRIu64, val);
|
|
|
|
For GLib routines, and only those, you can choose whichever format style
|
|
you prefer:
|
|
|
|
uint64_t val = UINT64_C(1);
|
|
char *str1 = g_string_printf("%" G_GUINT64_FORMAT, val);
|
|
char *str2 = g_string_printf("%" PRIu64, val);
|
|
|
|
These format macros will be the same modulo any GLib bugs.
|
|
|
|
When specifying an integral constant that doesn't fit in 32 bits, don't
|
|
use "LL" at the end of the constant - not all compilers use "LL" for
|
|
that. Instead, put the constant in a call to the "INT64_C()" or "UINT64_C()"
|
|
macro, e.g.
|
|
|
|
INT64_C(-11644473600), UINT64_C(11644473600)
|
|
|
|
rather than
|
|
|
|
-11644473600LL, 11644473600ULL
|
|
|
|
Don't assume that you can scan through a va_list initialized by va_start
|
|
more than once without closing it with va_end and re-initializing it with
|
|
va_start. This applies even if you're not scanning through it yourself,
|
|
but are calling a routine that scans through it, such as vfprintf() or
|
|
one of the routines in Wireshark that takes a format and a va_list as an
|
|
argument. You must do
|
|
|
|
va_start(ap, format);
|
|
call_routine1(xxx, format, ap);
|
|
va_end(ap);
|
|
va_start(ap, format);
|
|
call_routine2(xxx, format, ap);
|
|
va_end(ap);
|
|
|
|
rather than
|
|
|
|
va_start(ap, format);
|
|
call_routine1(xxx, format, ap);
|
|
call_routine2(xxx, format, ap);
|
|
va_end(ap);
|
|
|
|
Don't use a label without a statement following it. For example,
|
|
something such as
|
|
|
|
if (...) {
|
|
|
|
...
|
|
|
|
done:
|
|
}
|
|
|
|
will not work with all compilers - you have to do
|
|
|
|
if (...) {
|
|
|
|
...
|
|
|
|
done:
|
|
;
|
|
}
|
|
|
|
with some statement, even if it's a null statement, after the label.
|
|
Preferably don't do it at all.
|
|
|
|
Don't use "bzero()", "bcopy()", or "bcmp()"; instead, use the ANSI C
|
|
routines
|
|
|
|
"memset()" (with zero as the second argument, so that it sets
|
|
all the bytes to zero);
|
|
|
|
"memcpy()" or "memmove()" (note that the first and second
|
|
arguments to "memcpy()" are in the reverse order to the
|
|
arguments to "bcopy()"; note also that "bcopy()" is typically
|
|
guaranteed to work on overlapping memory regions, while
|
|
"memcpy()" isn't, so if you may be copying from one region to a
|
|
region that overlaps it, use "memmove()", not "memcpy()" - but
|
|
"memcpy()" might be faster as a result of not guaranteeing
|
|
correct operation on overlapping memory regions);
|
|
|
|
and "memcmp()" (note that "memcmp()" returns 0, 1, or -1, doing
|
|
an ordered comparison, rather than just returning 0 for "equal"
|
|
and 1 for "not equal", as "bcmp()" does).
|
|
|
|
Not all platforms necessarily have "bzero()"/"bcopy()"/"bcmp()", and
|
|
those that do might not declare them in the header file on which they're
|
|
declared on your platform.
|
|
|
|
Don't use "index()" or "rindex()"; instead, use the ANSI C equivalents,
|
|
"strchr()" and "strrchr()". Not all platforms necessarily have
|
|
"index()" or "rindex()", and those that do might not declare them in the
|
|
header file on which they're declared on your platform.
|
|
|
|
Don't use "tvb_get_ptr()". If you must use it, keep in mind that the pointer
|
|
returned by a call to "tvb_get_ptr()" is not guaranteed to be aligned on any
|
|
particular byte boundary; this means that you cannot safely cast it to any
|
|
data type other than a pointer to "char", "unsigned char", "guint8", or other
|
|
one-byte data types. Casting a pointer returned by tvb_get_ptr() into any
|
|
multi-byte data type or structure may cause crashes on some platforms (even
|
|
if it does not crash on x86-based PCs). Even if such mis-aligned accesses
|
|
don't crash on your platform they will be slower than properly aligned
|
|
accesses would be. Furthermore, the data in a packet is not necessarily in
|
|
the byte order of the machine on which Wireshark is running. Use the tvbuff
|
|
routines to extract individual items from the packet, or, better yet, use
|
|
"proto_tree_add_item()" and let it extract the items for you.
|
|
|
|
Don't use structures that overlay packet data, or into which you copy
|
|
packet data; the C programming language does not guarantee any
|
|
particular alignment of fields within a structure, and even the
|
|
extensions that try to guarantee that are compiler-specific and not
|
|
necessarily supported by all compilers used to build Wireshark. Using
|
|
bitfields in those structures is even worse; the order of bitfields
|
|
is not guaranteed.
|
|
|
|
Don't use "ntohs()", "ntohl()", "htons()", or "htonl()"; the header
|
|
files required to define or declare them differ between platforms, and
|
|
you might be able to get away with not including the appropriate header
|
|
file on your platform but that might not work on other platforms.
|
|
Instead, use "g_ntohs()", "g_ntohl()", "g_htons()", and "g_htonl()";
|
|
those are declared by <glib.h>, and you'll need to include that anyway,
|
|
as Wireshark header files that all dissectors must include use stuff from
|
|
<glib.h>.
|
|
|
|
Don't fetch a little-endian value using "tvb_get_ntohs() or
|
|
"tvb_get_ntohl()" and then using "g_ntohs()", "g_htons()", "g_ntohl()",
|
|
or "g_htonl()" on the resulting value - the g_ routines in question
|
|
convert between network byte order (big-endian) and *host* byte order,
|
|
not *little-endian* byte order; not all machines on which Wireshark runs
|
|
are little-endian, even though PCs are. Fetch those values using
|
|
"tvb_get_letohs()" and "tvb_get_letohl()".
|
|
|
|
Do not use "open()", "rename()", "mkdir()", "stat()", "unlink()", "remove()",
|
|
"fopen()", "freopen()" directly. Instead use "ws_open()", "ws_rename()",
|
|
"ws_mkdir()", "ws_stat()", "ws_unlink()", "ws_remove()", "ws_fopen()",
|
|
"ws_freopen()": these wrapper functions change the path and file name from
|
|
UTF8 to UTF16 on Windows allowing the functions to work correctly when the
|
|
path or file name contain non-ASCII characters.
|
|
|
|
Also, use ws_read(), ws_write(), ws_lseek(), ws_dup(), ws_fstat(), and
|
|
ws_fdopen(), rather than read(), write(), lseek(), dup(), fstat(), and
|
|
fdopen() on descriptors returned by ws_open().
|
|
|
|
Those functions are declared in <wsutil/file_util.h>; include that
|
|
header in any code that uses any of those routines.
|
|
|
|
When opening a file with "ws_fopen()", "ws_freopen()", or "ws_fdopen()", if
|
|
the file contains ASCII text, use "r", "w", "a", and so on as the open mode
|
|
- but if it contains binary data, use "rb", "wb", and so on. On
|
|
Windows, if a file is opened in a text mode, writing a byte with the
|
|
value of octal 12 (newline) to the file causes two bytes, one with the
|
|
value octal 15 (carriage return) and one with the value octal 12, to be
|
|
written to the file, and causes bytes with the value octal 15 to be
|
|
discarded when reading the file (to translate between C's UNIX-style
|
|
lines that end with newline and Windows' DEC-style lines that end with
|
|
carriage return/line feed).
|
|
|
|
In addition, that also means that when opening or creating a binary
|
|
file, you must use "ws_open()" (with O_CREAT and possibly O_TRUNC if the
|
|
file is to be created if it doesn't exist), and OR in the O_BINARY flag,
|
|
even on UN*X - O_BINARY is defined by <wsutil/file_util.h> as 0 on UN*X.
|
|
|
|
Do not include <unistd.h>, <fcntl.h>, or <io.h> to declare any of the
|
|
routines listed as replaced by routines in <wsutil/file_util.h>;
|
|
instead, just include <wsutil/file_util.h>.
|
|
|
|
If you need the declarations of other functions defined by <unistd.h>,
|
|
don't include it without protecting it with
|
|
|
|
#ifdef HAVE_UNISTD_H
|
|
|
|
...
|
|
|
|
#endif
|
|
|
|
Don't use forward declarations of static arrays without a specified size
|
|
in a fashion such as this:
|
|
|
|
static const value_string foo_vals[];
|
|
|
|
...
|
|
|
|
static const value_string foo_vals[] = {
|
|
{ 0, "Red" },
|
|
{ 1, "Green" },
|
|
{ 2, "Blue" },
|
|
{ 0, NULL }
|
|
};
|
|
|
|
as some compilers will reject the first of those statements. Instead,
|
|
initialize the array at the point at which it's first declared, so that
|
|
the size is known.
|
|
|
|
For #define names and enum member names, prefix the names with a tag so
|
|
as to avoid collisions with other names - this might be more of an issue
|
|
on Windows, as it appears to #define names such as DELETE and
|
|
OPTIONAL.
|
|
|
|
Don't use the "positional parameters" extension that many UNIX printf's
|
|
implement, e.g.:
|
|
|
|
snprintf(add_string, 30, " - (%1$d) (0x%1$04x)", value);
|
|
|
|
as not all UNIX printf's implement it, and Windows printf doesn't appear
|
|
to implement it. Use something like
|
|
|
|
snprintf(add_string, 30, " - (%d) (0x%04x)", value, value);
|
|
|
|
instead.
|
|
|
|
Don't use
|
|
|
|
case N ... M:
|
|
|
|
as that's not supported by all compilers.
|
|
|
|
Prefer the C99 output functions from <stdio.h> instead of their GLib
|
|
replacements (note that positional format parameters are not part of C99).
|
|
In the past we used to recommend using g_snprintf() and g_vsnprintf()
|
|
instead but since Visual Studio 2015 native C99 implementations are
|
|
available on all platforms we support. These are optimized better than
|
|
the gnulib (GLib) implementation and on hot codepaths that can be a
|
|
noticeable difference in execution speed.
|
|
|
|
tmpnam() -> mkstemp()
|
|
tmpnam is insecure and should not be used any more. Wireshark brings its
|
|
own mkstemp implementation for use on platforms that lack mkstemp.
|
|
Note: mkstemp does not accept NULL as a parameter.
|
|
|
|
Wireshark requires minimum versions of each of the libraries it uses, in
|
|
particular GLib 2.50.0 and Qt 5.12.0 or newer. If you require a mechanism
|
|
that is available only in a newer version of a library then use its
|
|
version detection macros, e.g. "#if GLIB_CHECK_VERSION(...)" and "#if
|
|
QT_VERSION_CHECK(...)" to conditionally compile code using that
|
|
mechanism.
|
|
|
|
When different code must be used on UN*X and Win32, use a #if or #ifdef
|
|
that tests _WIN32, not WIN32. Try to write code portably whenever
|
|
possible, however; note that there are some routines in Wireshark with
|
|
platform-dependent implementations and platform-independent APIs, such
|
|
as the routines in epan/filesystem.c, allowing the code that calls it to
|
|
be written portably without #ifdefs.
|
|
|
|
We support building on Windows using MinGW-w64 (experimental) so be mindful
|
|
of the difference between an #ifdef on _WIN32 and _MSC_VER. The first tests
|
|
if the platform is some version of Windows and also applies to MinGW. The
|
|
latter tests if the toolchain is Microsoft Visual Studio. Sometimes you need
|
|
one or the other, depending on whether the condition applies to all Windows
|
|
compilers or only Microsoft's compiler. Use #ifdef __MINGW32__ to test for
|
|
a MinGW toolchain, including MinGW-w64. The same concern applies to CMake
|
|
code. Depending on the particular situation you may need to use if(WIN32) or
|
|
if(MSVC) or if(MINGW).
|
|
|
|
Wireshark uses Libgcrypt as general-purpose crypto library. Some Wireshark
|
|
specific extensions are defined in wsutil/wsgcrypt.h. You might want to
|
|
include that file instead.
|
|
|
|
2. String handling
|
|
|
|
Do not use functions such as strcat() or strcpy().
|
|
A lot of work has been done to remove the existing calls to these functions and
|
|
we do not want any new callers of these functions.
|
|
|
|
Instead use snprintf() since that function will if used correctly prevent
|
|
buffer overflows for large strings.
|
|
|
|
Be sure that all pointers passed to %s specifiers in format strings are non-
|
|
NULL. Some implementations will automatically replace NULL pointers with the
|
|
string "(NULL)", but most will not.
|
|
|
|
When using a buffer to create a string, do not use a buffer stored on the stack.
|
|
I.e. do not use a buffer declared as
|
|
|
|
char buffer[1024];
|
|
|
|
instead allocate a buffer dynamically using the string-specific or plain wmem
|
|
routines (see README.wmem) such as
|
|
|
|
wmem_strbuf_t *strbuf;
|
|
strbuf = wmem_strbuf_new(pinfo->pool, "");
|
|
wmem_strbuf_append_printf(strbuf, ...
|
|
|
|
or
|
|
|
|
char *buffer=NULL;
|
|
...
|
|
#define MAX_BUFFER 1024
|
|
buffer=wmem_alloc(pinfo->pool, MAX_BUFFER);
|
|
buffer[0]='\0';
|
|
...
|
|
snprintf(buffer, MAX_BUFFER, ...
|
|
|
|
This avoids the stack from being corrupted in case there is a bug in your code
|
|
that accidentally writes beyond the end of the buffer.
|
|
|
|
|
|
If you write a routine that will create and return a pointer to a filled in
|
|
string and if that buffer will not be further processed or appended to after
|
|
the routine returns (except being added to the proto tree),
|
|
do not preallocate the buffer to fill in and pass as a parameter instead
|
|
pass a pointer to a pointer to the function and return a pointer to a
|
|
wmem-allocated buffer that will be automatically freed. (see README.wmem)
|
|
|
|
I.e. do not write code such as
|
|
static void
|
|
foo_to_str(char *string, ... ){
|
|
<fill in string>
|
|
}
|
|
...
|
|
char buffer[1024];
|
|
...
|
|
foo_to_str(buffer, ...
|
|
proto_tree_add_string(... buffer ...
|
|
|
|
instead write the code as
|
|
static void
|
|
foo_to_str(char **buffer, ...
|
|
#define MAX_BUFFER x
|
|
*buffer=wmem_alloc(pinfo->pool, MAX_BUFFER);
|
|
<fill in *buffer>
|
|
}
|
|
...
|
|
char *buffer;
|
|
...
|
|
foo_to_str(&buffer, ...
|
|
proto_tree_add_string(... *buffer ...
|
|
|
|
Use wmem_ allocated buffers. They are very fast and nice. These buffers are all
|
|
automatically free()d when the dissection of the current packet ends so you
|
|
don't have to worry about free()ing them explicitly in order to not leak memory.
|
|
Please read README.wmem.
|
|
|
|
Source files can use UTF-8 encoding, but characters outside the ASCII
|
|
range should be used sparingly. It should be safe to use non-ASCII
|
|
characters in comments and strings, but some compilers (such as GCC
|
|
versions prior to 10) may not support extended identifiers very well.
|
|
There is also no guarantee that a developer's text editor will interpret
|
|
the characters the way you intend them to be interpreted.
|
|
|
|
The majority of Wireshark encodes strings as UTF-8. The main exception
|
|
is the code that uses the Qt API, which uses UTF-16. Console output is
|
|
UTF-8, but as with the source code extended characters should be used
|
|
sparingly since some consoles (most notably Windows' cmd.exe) have
|
|
limited support for UTF-8.
|
|
|
|
3. Robustness.
|
|
|
|
Wireshark is not guaranteed to read only network traces that contain correctly-
|
|
formed packets. Wireshark is commonly used to track down networking
|
|
problems, and the problems might be due to a buggy protocol implementation
|
|
sending out bad packets.
|
|
|
|
Therefore, code does not only have to be able to handle
|
|
correctly-formed packets without, for example, crashing or looping
|
|
infinitely, they also have to be able to handle *incorrectly*-formed
|
|
packets without crashing or looping infinitely.
|
|
|
|
Here are some suggestions for making code more robust in the face
|
|
of incorrectly-formed packets:
|
|
|
|
Do *NOT* use "ws_assert()" or "ws_assert_not_reached()" with input data in dissectors.
|
|
*NO* value in a packet's data should be considered "wrong" in the sense
|
|
that it's a problem with the dissector if found; if it cannot do
|
|
anything else with a particular value from a packet's data, the
|
|
dissector should put into the protocol tree an indication that the
|
|
value is invalid, and should return. The "expert" mechanism should be
|
|
used for that purpose.
|
|
|
|
Use assertions to catch logic errors in your program. A failed assertion
|
|
indicates a bug in the code. Use ws_assert() instead of g_assert() to
|
|
test a logic condition. Note that ws_assert() can be removed at compile
|
|
time. Therefore assertions should not have any side-effects,
|
|
otherwise the program may behave inconsistently.
|
|
|
|
Use ws_assert_not_reached() instead of g_assert_not_reached() for
|
|
unreachable error conditions. For example if (and only if) you know
|
|
'myvar' can only have the values 1 and 2 do:
|
|
switch(myvar) {
|
|
case 1:
|
|
(...)
|
|
break;
|
|
case 2:
|
|
(...)
|
|
break;
|
|
default:
|
|
ws_assert_not_reached();
|
|
break;
|
|
}
|
|
|
|
For dissectors use DISSECTOR_ASSERT() and DISSECTOR_ASSERT_NOT_REACHED()
|
|
instead, with the same caveats as above.
|
|
|
|
You should continue to use g_assert_true(), g_assert_cmpstr(), etc for
|
|
"test code", such as unit testing. These assertions are always active.
|
|
See the GLib Testing API documentation for the details on each of those
|
|
functions.
|
|
|
|
If there is a case where you are checking not for an invalid data item
|
|
in the packet, but for a bug in the dissector (for example, an
|
|
assumption being made at a particular point in the code about the
|
|
internal state of the dissector), use the DISSECTOR_ASSERT macro for
|
|
that purpose; this will put into the protocol tree an indication that
|
|
the dissector has a bug in it, and will not crash the application.
|
|
|
|
If you are allocating a chunk of memory to contain data from a packet,
|
|
or to contain information derived from data in a packet, and the size of
|
|
the chunk of memory is derived from a size field in the packet, make
|
|
sure all the data is present in the packet before allocating the buffer.
|
|
Doing so means that:
|
|
|
|
1) Wireshark won't leak that chunk of memory if an attempt to
|
|
fetch data not present in the packet throws an exception.
|
|
|
|
and
|
|
|
|
2) it won't crash trying to allocate an absurdly-large chunk of
|
|
memory if the size field has a bogus large value.
|
|
|
|
If you're fetching into such a chunk of memory a string from the buffer,
|
|
and the string has a specified size, you can use "tvb_get_*_string()",
|
|
which will check whether the entire string is present before allocating
|
|
a buffer for the string, and will also put a trailing '\0' at the end of
|
|
the buffer.
|
|
|
|
If you're fetching into such a chunk of memory a 2-byte Unicode string
|
|
from the buffer, and the string has a specified size, you can use
|
|
"tvb_get_faked_unicode()", which will check whether the entire string
|
|
is present before allocating a buffer for the string, and will also
|
|
put a trailing '\0' at the end of the buffer. The resulting string will be
|
|
a sequence of single-byte characters; the only Unicode characters that
|
|
will be handled correctly are those in the ASCII range. (Wireshark's
|
|
ability to handle non-ASCII strings is limited; it needs to be
|
|
improved.)
|
|
|
|
If you're fetching into such a chunk of memory a sequence of bytes from
|
|
the buffer, and the sequence has a specified size, you can use
|
|
"tvb_memdup()", which will check whether the entire sequence is present
|
|
before allocating a buffer for it.
|
|
|
|
Otherwise, you can check whether the data is present by using
|
|
"tvb_ensure_bytes_exist()" although this frequently is not needed: the
|
|
TVB-accessor routines can handle requests to read data beyond the end of
|
|
the TVB (by throwing an exception which will either mark the frame as
|
|
truncated--not all the data was captured--or as malformed).
|
|
|
|
Note also that you should only fetch string data into a fixed-length
|
|
buffer if the code ensures that no more bytes than will fit into the
|
|
buffer are fetched ("the protocol ensures" isn't good enough, as
|
|
protocol specifications can't ensure only packets that conform to the
|
|
specification will be transmitted or that only packets for the protocol
|
|
in question will be interpreted as packets for that protocol by
|
|
Wireshark). If there's no maximum length of string data to be fetched,
|
|
routines such as "tvb_get_*_string()" are safer, as they allocate a buffer
|
|
large enough to hold the string. (Note that some variants of this call
|
|
require you to free the string once you're finished with it.)
|
|
|
|
If you have gotten a pointer using "tvb_get_ptr()" (which you should not
|
|
have: you should seriously consider a better alternative to this function),
|
|
you must make sure that you do not refer to any data past the length passed
|
|
as the last argument to "tvb_get_ptr()"; while the various "tvb_get"
|
|
routines perform bounds checking and throw an exception if you refer to data
|
|
not available in the tvbuff, direct references through a pointer gotten from
|
|
"tvb_get_ptr()" do not do any bounds checking.
|
|
|
|
If you have a loop that dissects a sequence of items, each of which has
|
|
a length field, with the offset in the tvbuff advanced by the length of
|
|
the item, then, if the length field is the total length of the item, and
|
|
thus can be zero, you *MUST* check for a zero-length item and abort the
|
|
loop if you see one. Otherwise, a zero-length item could cause the
|
|
dissector to loop infinitely. You should also check that the offset,
|
|
after having the length added to it, is greater than the offset before
|
|
the length was added to it, if the length field is greater than 24 bits
|
|
long, so that, if the length value is *very* large and adding it to the
|
|
offset causes an overflow, that overflow is detected.
|
|
|
|
If you have a
|
|
|
|
for (i = {start}; i < {end}; i++)
|
|
|
|
loop, make sure that the type of the loop index variable is large enough
|
|
to hold the maximum {end} value plus 1; otherwise, the loop index
|
|
variable can overflow before it ever reaches its maximum value. In
|
|
particular, be very careful when using gint8, guint8, gint16, or guint16
|
|
variables as loop indices; you almost always want to use an "int"/"gint"
|
|
or "unsigned int"/"guint" as the loop index rather than a shorter type.
|
|
|
|
If you are fetching a length field from the buffer, corresponding to the
|
|
length of a portion of the packet, and subtracting from that length a
|
|
value corresponding to the length of, for example, a header in the
|
|
packet portion in question, *ALWAYS* check that the value of the length
|
|
field is greater than or equal to the length you're subtracting from it,
|
|
and report an error in the packet and stop dissecting the packet if it's
|
|
less than the length you're subtracting from it. Otherwise, the
|
|
resulting length value will be negative, which will either cause errors
|
|
in the dissector or routines called by the dissector, or, if the value
|
|
is interpreted as an unsigned integer, will cause the value to be
|
|
interpreted as a very large positive value.
|
|
|
|
Any tvbuff offset that is added to as processing is done on a packet
|
|
should be stored in a 32-bit variable, such as an "int"; if you store it
|
|
in an 8-bit or 16-bit variable, you run the risk of the variable
|
|
overflowing.
|
|
|
|
sprintf() -> snprintf()
|
|
Prevent yourself from using the sprintf() function, as it does not test the
|
|
length of the given output buffer and might be writing into unintended memory
|
|
areas. This function is one of the main causes of security problems like buffer
|
|
exploits and many other bugs that are very hard to find. It's much better to
|
|
use the snprintf() function declared by <stdio.h> instead.
|
|
|
|
You should test your dissector against incorrectly-formed packets. This
|
|
can be done using the randpkt and editcap utilities that come with the
|
|
Wireshark distribution. Testing using randpkt can be done by generating
|
|
output at the same layer as your protocol, and forcing Wireshark/TShark
|
|
to decode it as your protocol, e.g. if your protocol sits on top of UDP:
|
|
|
|
randpkt -c 50000 -t dns randpkt.pcap
|
|
tshark -nVr randpkt.pcap -d udp.port==53,<myproto>
|
|
|
|
Testing using editcap can be done using preexisting capture files and the
|
|
"-E" flag, which introduces errors in a capture file. E.g.:
|
|
|
|
editcap -E 0.03 infile.pcap outfile.pcap
|
|
tshark -nVr outfile.pcap
|
|
|
|
The script fuzz-test.sh is available to help automate these tests.
|
|
|
|
4. Name convention.
|
|
|
|
Wireshark uses the underscore_convention rather than the InterCapConvention for
|
|
function names, so new code should probably use underscores rather than
|
|
intercaps for functions and variable names. This is especially important if you
|
|
are writing code that will be called from outside your code. We are just
|
|
trying to keep things consistent for other developers.
|
|
|
|
C symbols exported from libraries shipped with Wireshark should start with a
|
|
prefix that helps avoiding name collision with public symbols from other shared
|
|
libraries. The current suggested prefixes for newly added symbols are
|
|
ws_, wslua_, wmem_ and wtap_.
|
|
|
|
5. White space convention.
|
|
|
|
Most of the C and C++ files in Wireshark use 4-space or 2-space indentation.
|
|
When creating new files you are you are strongly encouraged to use 4-space
|
|
indentation for source code in order to ensure consistency between files.
|
|
|
|
Please avoid using tab expansions different from 8 column widths, as not all
|
|
text editors in use by the developers support this. For a detailed discussion
|
|
of tabs, spaces, and indentation, see
|
|
|
|
http://www.jwz.org/doc/tabs-vs-spaces.html
|
|
|
|
We use EditorConfig (http://editorconfig.org) files to provide formatting
|
|
hints. Most editors and IDEs support EditorConfig, either directly or via
|
|
a plugin. If yours requires a plugin we encourage you to install it. Our
|
|
default EditorConfig indentation style for C and C++ files is 4 spaces.
|
|
|
|
Many files also have a short comment (modelines) on the indentation logic at
|
|
the end of the file. This was required in the past but has been superseded by
|
|
EditorConfig. See
|
|
|
|
https://www.wireshark.org/tools/modelines.html
|
|
|
|
for more information.
|
|
|
|
Please do not leave trailing whitespace (spaces/tabs) on lines.
|
|
|
|
Quite a bit of our source code has varying indentation styles. When editing an
|
|
existing file, try following the existing indentation logic. If you wish to
|
|
convert a file to 4 space indentation, please do so in its own commit and be
|
|
sure to remove its .editorconfig entry so that the default setting takes
|
|
effect.
|
|
|
|
6. Compiler warnings
|
|
|
|
You should write code that is free of compiler warnings. Such warnings will
|
|
often indicate questionable code and sometimes even real bugs, so it's best
|
|
to avoid warnings at all.
|
|
|
|
The compiler flags in the Makefiles are set to "treat warnings as errors",
|
|
so your code won't even compile when warnings occur.
|
|
|
|
7. General observations about architecture
|
|
|
|
7.1 The global header "wireshark.h"
|
|
|
|
You should include the global header <wireshark.h> in your code. However
|
|
there are some things to keep in mind when using it and especially
|
|
if you are considering modifying it.
|
|
|
|
** wireshark.h needs to be minimal: for efficiency reasons, to reduce the
|
|
error surface and because every time this header changes everything must be
|
|
rebuilt. Consider carefully if another header/module should be included
|
|
globally with every project file and exported as public header.
|
|
|
|
** No configuration: configuration is specific to the build environment
|
|
and target machine. wireshark.h must not depend on that.
|
|
|
|
** Only wireshark system headers allowed: plugins use this header and
|
|
cannot depend on any header (even indirectly) that is not installed on the
|
|
target system.
|
|
|
|
** Only global definitions allowed: for example it is acceptable to include
|
|
'wsutil' headers in wireshark.h because every component of Wireshark is allowed
|
|
to depend on wsutil. wiretap is not acceptable because we cannot introduce
|
|
dependencies on wiretap globally (and wireshark.h must be usable everywhere).
|
|
|
|
7.2 Best practices using headers
|
|
|
|
C files can be categorized in three types: source files, private headers and
|
|
public headers.
|
|
|
|
A module "foobar" can have only a private header, only a public header, or
|
|
both. If it's only one it is named "foobar.h" in both cases. If it is both they
|
|
are named "foobar-int.h" and "foobar.h" respectively.
|
|
|
|
In general the order of #include's for a C module source files (foobar.c),
|
|
assuming foobar implements any kind of interface should be:
|
|
|
|
#include "config.h"
|
|
#define WS_LOG_DOMAIN "mydomain"
|
|
#include "foobar-int.h"
|
|
|
|
followed by <system headers>
|
|
followed by <wireshark public headers>
|
|
followed by <wireshark private headers>
|
|
|
|
For header files (private and public) config.h must NOT be included. A public
|
|
header file (foobar.h) looks like this:
|
|
|
|
#ifndef __FOOBAR_H__
|
|
#define __FOOBAR_H__
|
|
#include <wireshark.h>
|
|
followed by <system headers>
|
|
followed by <wireshark public headers>
|
|
|
|
#ifdef __cplusplus
|
|
extern "C" {
|
|
#endif
|
|
(declarations)
|
|
#ifdef __cplusplus
|
|
}
|
|
#endif
|
|
#endif /* FOOBAR_H */
|
|
|
|
A private header (foobar-int.h) is the public header plus the declarations
|
|
with private scope:
|
|
|
|
#ifndef __FOOBAR_INT_H__
|
|
#define __FOOBAR_INT_H__
|
|
#include "foobar.h"
|
|
followed by <system headers>
|
|
followed by <wireshark public headers>
|
|
followed by <wireshark private headers>
|
|
(etc.)
|
|
|
|
Again if there are only public or private declarations the name foobar-int.h
|
|
is not used. The macro symbol WS_LOG_DOMAIN can be defined in source files or
|
|
private headers as long as it comes before wireshark.h.
|
|
|
|
7.3 Wireshark internal and external API policy
|
|
|
|
Wireshark has several APIs. We need to distinguish between internal
|
|
Wireshark library APIs and external Wireshark APIs. Wireshark the project is
|
|
composed of many different programs and these executable binaries use a number
|
|
of internal libraries to share code efficiently. These internal shared
|
|
libraries need to be installed on the system to run the programs (wireshark,
|
|
tshark, etc).
|
|
|
|
A library's public API includes the symbols exported by the DSO (wsutil,
|
|
libwireshark, etc). The internal API is made available in the shared libraries
|
|
and exists to support the goals of the project. It is public from the point
|
|
of view of Wireshark programs (client users of the internal API). The
|
|
external API exists to support plugins (client users of the external API)
|
|
and is a loosely defined subset of the internal API plus any infrastructure
|
|
required to support a plugin system. Note that these two uses of shared
|
|
libraries coexist with a lot of overlap, but are nevertheless distinct.
|
|
|
|
The internal (public) API is not considered to be stable and will regularly
|
|
change as a normal part of development to support new features, remove cruft,
|
|
and whatever else is necessary to make the project sustainable and ease the
|
|
burden on developers. There is less freedom to change something that could
|
|
break a lot of plugins but this is also acceptable (with cause).
|
|
|
|
The plugin ABI policy is to be compatible only between micro releases (also
|
|
called patch releases). That means we try to make it unnecessary to recompile
|
|
plugins with each micro release (on a best-effort basis). For major.minor
|
|
releases it is explicitly required to recompile plugins. There is no stable
|
|
ABI contract of any kind in that case.
|
|
|
|
Keep in mind that APIs can exist in different scopes and levels of abstraction.
|
|
Don't get stuck thinking the words public/private have a very specific
|
|
meaning, like being decorated or not with WS_DLL_PUBLIC, although that is a
|
|
big part of it usually.
|
|
|
|
Also the Wireshark developers have historically tried to keep the Lua API
|
|
very stable and provide strong backward-compatibility guarantees. Under this
|
|
policy moving from Lua 5.2 is unlikely to happen in the foreseeable future.
|
|
|
|
7.4 libwireshark is not a single monolithic entity
|
|
|
|
One day we might conceivably wish to load dissectors on demand and do other
|
|
more sophisticated kinds of unit test. Plus other scenarios not immediately
|
|
obvious. For this to be possible it is important that the code in epan/ does
|
|
not depend on code in epan/dissectors, i.e it is possible to compile epan
|
|
without linking with dissector code. It helps to view dissectors as clients
|
|
of an API provided by epan (libwireshark being constituted by two distinct
|
|
components "epan" and "dissectors" bundled together, plus other bits and
|
|
pieces). The reverse is not* true; epan should not be the client of an API
|
|
provided by dissectors.
|
|
|
|
The main way this separation of concerns is achieved is by using runtime
|
|
registration interfaces in epan for dissectors, preferences, etc. that are
|
|
dynamic and do not have any dissector routines hard coded. Naturally this
|
|
is also an essential component of a plugin system (libwireshark has plugins
|
|
for taps, dissectors and an experimental interface to augment dissection with
|
|
new extension languages).
|
|
|
|
7.5 Unicode and string encoding best practices
|
|
|
|
Wireshark strings are always encoded in UTF-8 internally, regardless of the platform
|
|
where it is running. The C datatype used is "pointer to char" and this is assumed
|
|
to point to a valid UTF-8 string. Sometimes older code uses char to point to opaque
|
|
byte strings but this archaic usage should be avoided. A better data type
|
|
for that is uint8_t.
|
|
|
|
Every untrusted string needs to be validated for correct and error-free UTF-8
|
|
encoding, or converted from the source encoding to UTF-8. This should be done
|
|
at the periphery of the code. This means converting input during dissection or
|
|
when reading input generally. To reiterate: all the Wireshark APIs expect to
|
|
receive valid UTF-8 strings. These include proto_tree_add_string(),
|
|
proto_item_append_text() and col_append_fstr() just to name a few.
|
|
|
|
If a dissector uses standard API functions to handle strings, such as
|
|
proto_tree_add_item() with an FT_STRING header field type, the API will
|
|
transparently handle the conversion from the source encoding to UTF-8 and
|
|
nothing else needs to be done to ensure valid string input.
|
|
|
|
If your dissector does text manipulation, token parsing and such and generally
|
|
extracts text strings from the TVBuff or tries to do line oriented input from
|
|
TVBuffs it *must* make sure it passes only valid UTF-8 to libwireshark APIs.
|
|
This should be done using tvb_get_string_enc() to extract a string from a TVbuff
|
|
or get_utf_8_string() to validate a string after it has been constructed.
|
|
|
|
8. Miscellaneous notes
|
|
|
|
Each commit in your branch corresponds to a different VCSVERSION string
|
|
automatically defined in the header 'vcs_version.h' during the build. If you happen
|
|
to find it convenient to disable this feature it can be done using:
|
|
|
|
touch .git/wireshark-disable-versioning
|
|
|
|
i.e., the file 'wireshark-disable-versioning' must exist in the git repo dir.
|
|
|
|
/*
|
|
* Editor modelines - https://www.wireshark.org/tools/modelines.html
|
|
*
|
|
* Local variables:
|
|
* c-basic-offset: 4
|
|
* tab-width: 8
|
|
* indent-tabs-mode: nil
|
|
* End:
|
|
*
|
|
* vi: set shiftwidth=4 tabstop=8 expandtab:
|
|
* :indentSize=4:tabSize=8:noTabs=true:
|
|
*/
|