wireshark/doc/README.wmem

1. Introduction

The 'wmem' memory manager is Wireshark's memory management framework, replacing
the old 'emem' framework which was removed in Wireshark 2.0.

In order to make memory management easier and to reduce the probability of
memory leaks, Wireshark provides its own memory management API. This API is
implemented inside epan/wmem/ and provides memory pools and functions that make
it easy to manage memory even in the face of exceptions (which many dissector
functions can raise).

Correct use of these functions will make your code faster, and greatly reduce
the chances that it will leak memory in exceptional cases.

Wmem was originally conceived in this email to the wireshark-dev mailing list:
https://www.wireshark.org/lists/wireshark-dev/201210/msg00178.html

2. Usage for Consumers

If you're writing a dissector, or other "userspace" code, then using wmem
should be very similar to using emem. All you need to do is include the header
(epan/wmem/wmem.h) and get a handle to a memory pool (if you want to *create*
a memory pool, see the section "3. Usage for Producers" below).

A memory pool is an opaque pointer to an object of type wmem_allocator_t, and
it is the very first parameter passed to almost every call you make to wmem.
Other than that parameter (and the fact that functions are prefixed wmem_
instead of ep_ or se_) usage is exactly like that of emem. For example:

    wmem_alloc(myPool, 20);

allocates 20 bytes in the pool pointed to by myPool.

2.1 Available Pools

2.1.1 (Sort Of) Global Pools

Dissectors that include the wmem header file will have three pools available
to them automatically: wmem_packet_scope(), wmem_file_scope() and
wmem_epan_scope();

The packet pool is scoped to the dissection of each packet, meaning that any
memory allocated in it will be automatically freed at the end of the current
packet. The file pool is similarly scoped to the dissection of each file,
meaning that any memory allocated in it will be automatically freed when the
current capture file is closed.

NB: Using these pools outside of the appropriate scope (eg using the packet
    pool when there isn't a packet being dissected) will throw an assertion.
    See the comment in epan/wmem/wmem_scopes.c for details.

The epan pool is scoped to the library's lifetime - memory allocated in it is
not freed until epan_cleanup() is called, which is typically at the very end of
the program.

2.1.2 Pinfo Pool

Certain allocations (such as AT_STRINGZ address allocations and anything that
might end up being passed to add_new_data_source) need their memory to stick
around a little longer than the usual packet scope - basically until the
next packet is dissected. This is, in fact, the scope of Wireshark's pinfo
structure, so the pinfo struct has a 'pool' member which is a wmem pool scoped
to the lifetime of the pinfo struct.

2.1.3 NULL Pool

Any function that takes a pointer to a wmem_allocator_t can also be passed NULL
instead. In this case the memory is not allocated in a pool at all - it is
manually managed, and *must* eventually be passed to wmem_free() in order to
prevent memory leaks. Note that passing wmem_allocated memory directly to free()
or g_free() is not safe; the backing type of manually managed memory may be
changed without warning.

2.2 API

Full documentation for each function (parameters, return values, behaviours)
lives (or will live) in Doxygen-format in the header files for those functions.
This is just an overview of which header files you should be looking at.

2.2.1 Core API

wmem_core.h
 - Basic memory management functions (wmem_alloc, wmem_realloc, wmem_free).

2.2.2 Strings

wmem_strutl.h
 - Utility functions for manipulating null-terminated C-style strings.
   Functions like strdup and strdup_printf.

wmem_strbuf.h
 - A managed string object implementation, similar to std::string in C++ or
   GString from Glib.

2.2.3 Container Data Structures

wmem_array.h
 - A growable array (AKA vector) implementation.

wmem_list.h
 - A doubly-linked list implementation.

wmem_map.h
 - A hash map (AKA hash table) implementation.

wmem_queue.h
 - A queue implementation (first-in, first-out).

wmem_stack.h
 - A stack implementation (last-in, first-out).

wmem_tree.h
 - A balanced binary tree (red-black tree) implementation.

2.2.4 Miscellaneous Utilities

wmem_miscutl.h
 - Misc. utility functions like memdup.

2.3 Callbacks

WARNING: You probably don't actually need these; use them only when you're
         sure you understand the dangers.

Sometimes (though hopefully rarely) it may be necessary to store data in a wmem
pool that requires additional cleanup before it is freed. For example, perhaps
you have a pointer to a file-handle that needs to be closed. In this case, you
can register a callback with the wmem_register_callback function
declared in wmem_user_cb.h. Every time the memory in a pool is freed, all
registered cleanup functions are called first.

Note that callback calling order is not defined, you cannot rely on a
certain callback being called before or after another.

WARNING: Manually freeing or moving memory (with wmem_free or wmem_realloc)
         will NOT trigger any callbacks. It is an error to call either of
         those functions on memory if you have a callback registered to deal
         with the contents of that memory.

3. Usage for Producers

NB: If you're just writing a dissector, you probably don't need to read
    this section.

One of the problems with the old emem framework was that there were basically
two allocator backends (glib and mmap) that were all mixed together in a mess
of if statements, environment variables and #ifdefs. In wmem the different
allocator backends are cleanly separated out, and it's up to the owner of the
pool to pick one.

3.1 Available Allocator Back-Ends

Each available allocator type has a corresponding entry in the
wmem_allocator_type_t enumeration defined in wmem_core.h. See the doxygen
comments in that header file for details on each type.

3.2 Creating a Pool

To create a pool, include the regular wmem header and call the
wmem_allocator_new() function with the appropriate type value.
For example:

    #include "wmem/wmem.h"

    wmem_allocator_t *myPool;
    myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);

From here on in, you don't need to remember which type of allocator you used
(although allocator authors are welcome to expose additional allocator-specific
helper functions in their headers). The "myPool" variable can be passed around
and used as normal in allocation requests as described in section 2 of this
document.

3.3 Destroying a Pool

Regardless of which allocator you used to create a pool, it can be destroyed
with a call to the function wmem_destroy_allocator(). For example:

    #include "wmem/wmem.h"

    wmem_allocator_t *myPool;

    myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);

    /* Allocate some memory in myPool ... */

    wmem_destroy_allocator(myPool);

Destroying a pool will free all the memory allocated in it.

3.4 Reusing a Pool

It is possible to free all the memory in a pool without destroying it,
allowing it to be reused later. Depending on the type of allocator, doing this
(by calling wmem_free_all()) can be significantly cheaper than fully destroying
and recreating the pool. This method is therefore recommended, especially when
the pool would otherwise be scoped to a single iteration of a loop. For example:

    #include "wmem/wmem.h"

    wmem_allocator_t *myPool;

    myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE);
    for (...) {

        /* Allocate some memory in myPool ... */

        /* Free the memory, faster than destroying and recreating
           the pool each time through the loop. */
        wmem_free_all(myPool);
    }
    wmem_destroy_allocator(myPool);

4. Internal Design

Despite being written in Wireshark's standard C90, wmem follows a fairly
object-oriented design pattern. Although efficiency is always a concern, the
primary goals in writing wmem were maintainability and preventing memory
leaks.

4.1 struct _wmem_allocator_t

The heart of wmem is the _wmem_allocator_t structure defined in the
wmem_allocator.h header file. This structure uses C function pointers to
implement a common object-oriented design pattern known as an interface (also
known as an abstract class to those who are more familiar with C++).

Different allocator implementations can provide exactly the same interface by
assigning their own functions to the members of an instance of the structure.
The structure has eight members in three groups.

4.1.1 Implementation Details

 - private_data
 - type

The private_data pointer is a void pointer that the allocator implementation can
use to store whatever internal structures it needs. A pointer to private_data is
passed to almost all of the other functions that the allocator implementation
must define.

The type field is an enumeration of type wmem_allocator_type_t (see
section 3.1). Its value is set by the wmem_allocator_new() function, not
by the implementation-specific constructor. This field should be considered
read-only by the allocator implementation.

4.1.2 Consumer Functions

 - alloc()
 - free()
 - realloc()

These function pointers should be set to functions with semantics obviously
similar to their standard-library namesakes. Each one takes an extra parameter
that is a copy of the allocator's private_data pointer.

Note that realloc() and free() are not expected to be called directly by user
code in most cases - they are primarily optimisations for use by data
structures that wmem might want to implement (it's inefficient, for example, to
implement a dynamically sized array without some form of realloc).

Also note that allocators do not have to handle NULL pointers or 0-length
requests in any way - those checks are done in an allocator-agnostic way
higher up in wmem. Allocator authors can assume that all incoming pointers
(to realloc and free) are non-NULL, and that all incoming lengths (to malloc
and realloc) are non-0.

4.1.3 Producer/Manager Functions

 - free_all()
 - gc()
 - cleanup()

All of these functions take only one parameter, which is the allocator's
private_data pointer.

The free_all() function should free all the memory currently allocated in the
pool. Note that this is not necessarily exactly the same as calling free()
on all the allocated blocks - free_all() is allowed to do additional cleanup
or to make use of optimizations not available when freeing one block at a time.

The gc() function should do whatever it can to reduce excess memory usage in
the dissector by returning unused blocks to the OS, optimizing internal data
structures, etc.

The cleanup() function should do any final cleanup and free any and all memory.
It is basically the equivalent of a destructor function. For simplicity, wmem
is guaranteed to call free_all() immediately before calling this function. There
is no such guarantee that gc() has (ever) been called.

4.2 Pool-Agnostic API

One of the issues with emem was that the API (including the public data
structures) required wrapper functions for each scope implemented. Even
if there was a stack implementation in emem, it wasn't necessarily available
for use with file-scope memory unless someone took the time to write se_stack_
wrapper functions for the interface.

In wmem, all public APIs take the pool as the first argument, so that they can
be written once and used with any available memory pool. Data structures like
wmem's stack implementation only take the pool when created - the provided
pointer is stored internally with the data structure, and subsequent calls
(like push and pop) will take the stack itself instead of the pool.

4.3 Debugging

The primary debugging control for wmem is the WIRESHARK_DEBUG_WMEM_OVERRIDE
environment variable. If set, this value forces all calls to
wmem_allocator_new() to return the same type of allocator, regardless of which
type is requested normally by the code. It currently has three valid values:

 - The value "simple" forces the use of WMEM_ALLOCATOR_SIMPLE. The valgrind
   script currently sets this value, since the simple allocator is the only
   one whose memory allocations are trackable properly by valgrind.

 - The value "strict" forces the use of WMEM_ALLOCATOR_STRICT. The fuzz-test
   script currently sets this value, since the goal when fuzz-testing is to find
   as many errors as possible.

 - The value "block" forces the use of WMEM_ALLOCATOR_BLOCK. This is not
   currently used by any scripts, but is useful for stress-testing the block
   allocator.

 - The value "block_fast" forces the use of WMEM_ALLOCATOR_BLOCK_FAST. This is
   not currently used by any scripts, but is useful for stress-testing the fast
   block allocator.

Note that regardless of the value of this variable, it will always be safe to
call allocator-specific helpers functions. They are required to be safe no-ops
if the allocator argument is of the wrong type.

4.4 Testing

There is a simple test suite for wmem that lives in the file wmem_test.c and
should get automatically built into the binary 'wmem_test' when building
Wireshark. It contains at least basic tests for all existing functionality.
The suite is run automatically by the build-bots via the shell script
test/test.sh which calls out to test/suite-unittests.sh.

New features added to wmem (allocators, data structures, utility
functions, etc.) MUST also have tests added to this suite.

The test suite could potentially use a clean-up by someone more
intimately familiar with Glib's testing framework, but it does the job.

/*
 * Editor modelines  -  https://www.wireshark.org/tools/modelines.html
 *
 * Local variables:
 * c-basic-offset: 4
 * tab-width: 8
 * indent-tabs-mode: nil
 * End:
 *
 * vi: set shiftwidth=4 tabstop=8 expandtab:
 * :indentSize=4:tabSize=8:noTabs=true:
 */