Commit Graph

49 Commits

Author SHA1 Message Date
João Valverde bb426c7a85 CMake: Remove unnecessary wmem object library
The cmake wmem sub-library code is superfluous and adds complexity
if trying to build parallel different configurations of wsutil.
2023-02-12 13:25:44 +00:00
João Valverde eda38f5f2d Replace g_utf8_make_valid() with own function
The function ws_utf8_make_valid() is all-around better and
also does maximal substitution of subparts.
2023-02-08 11:21:19 +00:00
João Valverde a66b5080c3 Make wmem and wsutil a single logical library
We want to do more sophisticated processing of UTF-8 in wmem and
for that we want to use the unicode utility functions in wsutil.

We also want to use wmem scoped memory in wsutil unicode utility
functions.

This introduces a circular dependency. Fix that by making both
the same library and removing the sanitary cordon separating
them.

We still need to be mindful of public header  depencies of wmem on
wsutil because wmem.h is included in wireshark.h and we want to
be parsimonious with the use of global includes.
2023-02-08 11:21:19 +00:00
João Valverde fe7bfdf6ca CMake: Require explicit installation of development headers
Develpment headers are a sizeable part of the binary installation
and most users won't ever require them. It's recommended to package
them separately in a devel package or SDK.

Create a CMake installation component for development headers
and add the EXCLUDE_FROM_ALL property.

Headers can be installed using the invocation:

    cmake --install <dir> --component Development
2023-01-18 03:35:13 +00:00
João Valverde 4c9b0d846c CMake: Reverse debug macros
Originally WS_DISABLE_DEBUG was chosen to be
similar to G_DISABLE_ASSERT and NDEBUG.

However generator expressions are essential for modern CMake
but the syntax is weird and having to use negations makes it
ten-fold worse.

Remove the negation. Instead of changing the CMake variable
reverse the macro definition for WS_DISABLE_DEBUG.

The $<CONFIG:cgs> generator expression with multiple config arguments
requires CMake >= 3.19 so we can't use that yet for a further
syntactical simplification.
2023-01-12 00:59:15 +00:00
João Valverde 25d4a099f7 Remove WS_DISABLE_ASSERT
Assertions can be enabled/disabled using WS_DISABLE_DEBUG. The extra
granularity afforded by WS_DISABLE_ASSERT seems unnecessary.
2023-01-12 00:59:15 +00:00
João Valverde ea8b5fb024 wmem: Add wmem_strbuf_append_c_count() 2022-12-15 11:08:41 +00:00
João Valverde 1f34529839 wmem: Optimize some strbuf functions
The changes in commit 32f88ad22c allow removing some checks that
could speed up the code.
2022-12-15 11:08:41 +00:00
João Valverde 32f88ad22c wmem: Remove strbuf max size parameter
This parameter was introduced as a safeguard for bugs
that generate an unbounded string but its utility for
that purpose is doubtful and the way it is being used
creates problems with invalid truncation of UTF-8
strings.

Rename wmem_strbuf_sized_new() with a better name.
2022-12-03 01:54:52 +00:00
João Valverde 729ea56b46 wmem: Remove wmem_strbuf_new_label()
Only dissectors are using this function and there is no use case,
as far as I know, that requires its use. Any limitation of length
is imposed transparently by the UI backend.

This function is problematic because it is not Unicode aware and
will truncate a string on an arbitrary byte boundary for multibyte
strings.

Replace its use with a normal strbuf without a length limite and
remove the function because it is not useful and the ITEM_LABEL_LENGTH
parameter does not belong in wmem anyway.
2022-11-30 15:55:54 +00:00
John Thacker f4965d5dec wmem: Make wmem_strbuf_utf8_validate endpptr param optional
Often we don't care about the last valid character, just if
the buffer is valid.
2022-11-06 21:11:36 +00:00
João Valverde 6aa33f0fc9 wmem: Make strbuf_utf8_validate() accept embedded NUL bytes 2022-10-21 10:21:21 +00:00
João Valverde 4eb78424d2 CMake: Add -Werror to test binaries 2022-10-20 18:26:49 +01:00
John Thacker 7a4d05d63a charsets: Don't add illegal Unicode codepoints for UTF-16, UTF-32
If a character is not a valid Unicode codepoint, i.e. one of
the code points reserved for surrogate pairs or a code point
above 0x10FFFF, don't add it to a wmem_strbuf when converting
from other encodings but add a replacement character instead, by
using a new wmem_strbuf_append_unichar_validated() function.

Now we produce valid UTF-8 in various situations where UCS-2 or UTF-32
can encode unpaired surrogate codepoints. Consolidate some related
checks that are now redundant.

Also add a replacement character to the end of invalid UCS-2 strings
with an odd number of bytes, as done with UTF-16 and UTF-32.

Fix #18508
2022-10-19 07:53:02 -04:00
Guy Harris 1c9c1b5100 Add a #define for REPLACEMENT CHARACTER and use it.
Add UNICODE_REPLACEMENT_CHARACTER as a #define for the Unicode
REPLACEMENT CHARACTER code point (0x00FFFD), and use that instead of
0xfffd/0xFFFD/0x00FFFD in cases where that value refers to REPLACEMENT
CHARACTER.
2022-10-16 23:36:12 +00:00
João Valverde ab7b71605c TDS: Reject invalid ASCII
Fixes #18448.
2022-10-15 20:17:56 +00:00
João Valverde d2a488f5d5 wslog: Do not print control characters 2022-10-15 11:08:53 +01:00
João Valverde 05a32852a0 wmem: Avoid header dependency on wsutil
Including wireshark.h also pulls some wsutil headers. Avoid that.
2022-10-08 11:18:08 +00:00
João Valverde 51320ae59b wsutil: Improve UTF-8 APIs for debugging
In particular add an UTF-8 specific wslog API that should
make it easier to interpret invalid encodings.
2022-10-05 19:34:47 +01:00
João Valverde 79d02af2b5 wmem: Remove a redundant ternary operator
wmem_strbuf_grow should set the correct size with regard to max_size,
if set. In any case passing the actual free "raw" size to g_strlcpy is
always the correct thing to do.
2022-09-27 19:01:18 +01:00
João Valverde 9c4a42c07c wmem: Rename some variables
Use length and size consistently. strbuf->len does not
include the terminating nul. strbuf->alloc_len includes
the terminating nul.

Use consistent language and use "length" to mean size without
nul byte and "size" to mean size with all bytes, including nul.
2022-09-27 18:59:00 +01:00
João Valverde 6d06d4e46b Add some UTF-8 debug checks with a compile time flag
Some older dissectors that predate Unicode and parse text protocols
are prone to generate invalid UTF-8 strings. This is a bug and can have
safety implications.

For example passing invalid UTF-8 to proto_tree_add_string() is a
common bug. There are safeguards in format_text() but this should
not be relied on as a general solution to the problem.

For one, as the name implies, it is only used with representation of a
field value, which is not the same as the value itself of an FT_STRING field.
Issue #18317 shows another reason why.

For now this compile flag only enables extra checks for string ftypes,
which covers a subset of proto.h APIs including
proto_tree_append_string(). Later is should be extended to other
interfaces.

This is also not expected to be disabled for release builds because
there are still many dissectors that do not correctly handle strings.
More work is needed to 1) identify them and 2) fix them.

Ping #18317
2022-09-27 17:04:44 +00:00
John Thacker 819d392aff wmem: Add a wmem_map_foreach_remove function
Like wmem_map_remove(), this frees the key/value pair item
in the map but not the key or the value itself (which may
in fact be the same object.) Not generally a problem, as
they'll get freed by the pool. (If someone wants to manage
memory themselves, they should probably be using a GHashTable.)
2022-09-16 07:39:26 -04:00
Martin Mathieson e3ce838a3e UDPCP: seq-num analysis, and match data and ACKs 2022-09-15 08:19:51 +00:00
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde 947c617812 Remove some circular dependencies on wireshark.h 2022-01-05 13:31:52 +00:00
João Valverde fe5248717f Replace g_snprintf() with snprintf()
Use macros from inttypes.h with format strings.
2021-12-19 20:06:13 +00:00
João Valverde 7160b4b177 wsutil: Use snprintf() and ws_strdup_printf()
Replace GLib I/O with C library I/O.
2021-12-19 12:23:14 +00:00
João Valverde 612c0cff60 wmem: Add ws_strdup_printf() convenience macros
The convention (for wmem) is that functions with ws_ use
malloc'ed memory. This is just a convenience to avoid having
to pass a NULL allocator.
2021-12-19 10:48:15 +00:00
João Valverde f75b79a59d Move wmem string utility functions to wsutil 2021-12-19 10:47:50 +00:00
João Valverde 8cc527cce3 wmem: Use vasprintf()
Use vasprintf(3) if available to optimize wmem_stdup_printf().
2021-12-18 23:16:38 +00:00
João Valverde 58c297ca81 wmem_test: Add more string performance test
Add some C99 stdio.h numbers to compare with GLib on platforms
(such as Windows) where they use different implementations.

Add a wmem string test with NULL allocator, to compare wmem and GLib
performance with roughly the same memory allocation.

Use the block allocator as being more representative of normal
wmem performance, instead of using strict, that is normally
used for wmem debugging.
2021-12-18 20:13:41 +00:00
João Valverde 9465c5c28d wmem_test: Disable performance tests by default
These are not pass/fail tests, so the automation cannot
validate them. They just slow down the CI builds. To
enable pass -m perf.

I think the --verbose comment is wrong, I did not detect
any difference in output with or without --verbose.
2021-12-18 20:13:41 +00:00
João Valverde f837dae4c4 Fix wmem_test.c indentation 2021-12-18 19:39:21 +00:00
João Valverde fa41e2244c wmem: Optimize wmem_strdup_vprintf()
Because we already have the length of the output string after
calling vsnprintf(), we should avoid calling wmem_strdup(), which
will ignore that and recompute the length.

Increase the buffer size to a value that seems reasonable to
minimize the chance of a second call to vsnprintf().
2021-12-15 06:48:24 +00:00
João Valverde 77b6bca387 Convert wmem I/O to use stdio.h 2021-12-14 11:23:05 +00:00
João Valverde cace66d45d The macro 'va_copy' is C99, use that 2021-12-12 11:56:17 +00:00
AndersBroman 3e0506dbe9 Make wmem_print_tree public. 2021-12-06 16:06:13 +00:00
Moshe Kaplan e45ad9dcef wsutil: Add header files to Doxygen
Add @file markers for wsutil
headers so that Doxygen will
generate documentation for them.
2021-11-30 07:30:34 +00:00
Moshe Kaplan 40016daeb3 Add files with WS_DLL_PUBLIC to Doxygen part2
Add @file markers for remaining non-dissector
files that contain functions exported with
WS_DLL_PUBLIC so that Doxygen will
generate documentation for them.
2021-11-30 06:47:35 +00:00
Moshe Kaplan 1c3a9af869 Add files with WS_DLL_PUBLIC to Doxygen
Add @file markers for most files that
contain functions exported with
WS_DLL_PUBLIC so that Doxygen will
generate documentation for them.
2021-11-29 21:27:45 +00:00
João Valverde f5d8d9e306 wmem: Use better names in the API 2021-11-27 19:39:27 +00:00
John Thacker a839ee1c2b wmem: Fix filename in header comment
If the name is going to be in the header, might as well spell it
correctly
2021-11-26 07:46:11 -05:00
John Thacker b5917d0182 wmem: Add a multimap
A number of protocols have IDs that can be reused that are used as
lookup keys. In most cases the frame number should be used as well
to differentiate repeat appearances of an ID. For response/request
matching, it is frequently useful to find the most recent frame number
(greatest value less than or equal to the current one) that contained
an ID.

We can achieve that by using a multimap that stores values with a given
ID in a tree keyed with the frame number. This works better than using
a map or a tree alone:

1) A map isn't ordered, so doesn't allow for less than or equal comparison.
2) Using a tree requires an ordering on all the ID components, and then
   having to test all the components other than the frame number separately
   for equality after retrieval.

Currently the multimap does not support inserting items without specifying
the tree key (and having the multimap generate a key), because the total
capacity of trees (including deleted nodes) is not tracked. If other use
cases are needed, this could be added later along with more generic
multimap support.

Use a multimap in ANSI MAP, ANSI TCAP, and GSM SMS, all of which need to
match lookup IDs that can be reused. Fix #7653.
2021-11-21 07:16:55 -05:00
Gerald Combs e2703507c2 Update a bunch of GLib documentation links.
Change our developer.gnome.org/glib URLs to
developer-old.gnome.org/glib. The official documentation for GLib
appears to be at https://docs.gtk.org/glib/, but it has a different
layout than the gnome.org content (and is surprisingly resistant to
exploration IMHO). We can switch to developer-old.gnome.org using a
simple substitution and it still seems to be updated, so do that for
now.
2021-11-20 21:33:17 +00:00
Nardi Ivan 763247c2b3 QUIC: fix compilation on Raspberry 2021-10-19 20:04:17 +00:00
Guy Harris e490f93072 wmem: don't check whether sizeof(type) is <= 0.
This should fix the cppcheck warning "The unsigned expression
'sizeof(struct _PKT_INFO)' will never be negative so it is either
pointless or an error to check if it is."

wmem_safe_mult() was only used to do an overflow-safe multiplication of
a type size and a count of elements of that type; replace it with
wmem_safe_mult_type_size(), which takes the type as the first argument,
and checks only whether the count of elements is <= 0.
2021-08-24 20:58:00 -07:00
João Valverde 925e01b23f Remove duplicate format_size() function
We have two format_size()s, with and without wmem scoped memory.
Move the wmem version to wsutil and add a convenience macro to
use g_malloc()ed memory.
2021-07-26 14:56:11 +00:00
João Valverde 7f9c1f5f92 Move wmem to wsutil
This allows wmem to be used from other libraries, namely wsutil.
It is often the case that a funtion exists in wsutil and cannot
be used with a wmem scope, requiring some code duplication or
extra memory allocations, or vice-versa, code in epan cannot be
moved to wsutil because it has a wmem dependency.

To this end wmem is moved to wsutil. Scope management remains part
of epan because those scope semantics are specific to dissection.
2021-07-26 14:56:11 +00:00