Commit Graph

16 Commits

Author SHA1 Message Date
João Valverde ea8b5fb024 wmem: Add wmem_strbuf_append_c_count() 2022-12-15 11:08:41 +00:00
João Valverde 32f88ad22c wmem: Remove strbuf max size parameter
This parameter was introduced as a safeguard for bugs
that generate an unbounded string but its utility for
that purpose is doubtful and the way it is being used
creates problems with invalid truncation of UTF-8
strings.

Rename wmem_strbuf_sized_new() with a better name.
2022-12-03 01:54:52 +00:00
João Valverde 729ea56b46 wmem: Remove wmem_strbuf_new_label()
Only dissectors are using this function and there is no use case,
as far as I know, that requires its use. Any limitation of length
is imposed transparently by the UI backend.

This function is problematic because it is not Unicode aware and
will truncate a string on an arbitrary byte boundary for multibyte
strings.

Replace its use with a normal strbuf without a length limite and
remove the function because it is not useful and the ITEM_LABEL_LENGTH
parameter does not belong in wmem anyway.
2022-11-30 15:55:54 +00:00
John Thacker f4965d5dec wmem: Make wmem_strbuf_utf8_validate endpptr param optional
Often we don't care about the last valid character, just if
the buffer is valid.
2022-11-06 21:11:36 +00:00
John Thacker 7a4d05d63a charsets: Don't add illegal Unicode codepoints for UTF-16, UTF-32
If a character is not a valid Unicode codepoint, i.e. one of
the code points reserved for surrogate pairs or a code point
above 0x10FFFF, don't add it to a wmem_strbuf when converting
from other encodings but add a replacement character instead, by
using a new wmem_strbuf_append_unichar_validated() function.

Now we produce valid UTF-8 in various situations where UCS-2 or UTF-32
can encode unpaired surrogate codepoints. Consolidate some related
checks that are now redundant.

Also add a replacement character to the end of invalid UCS-2 strings
with an odd number of bytes, as done with UTF-16 and UTF-32.

Fix #18508
2022-10-19 07:53:02 -04:00
Guy Harris 1c9c1b5100 Add a #define for REPLACEMENT CHARACTER and use it.
Add UNICODE_REPLACEMENT_CHARACTER as a #define for the Unicode
REPLACEMENT CHARACTER code point (0x00FFFD), and use that instead of
0xfffd/0xFFFD/0x00FFFD in cases where that value refers to REPLACEMENT
CHARACTER.
2022-10-16 23:36:12 +00:00
João Valverde ab7b71605c TDS: Reject invalid ASCII
Fixes #18448.
2022-10-15 20:17:56 +00:00
João Valverde d2a488f5d5 wslog: Do not print control characters 2022-10-15 11:08:53 +01:00
João Valverde 05a32852a0 wmem: Avoid header dependency on wsutil
Including wireshark.h also pulls some wsutil headers. Avoid that.
2022-10-08 11:18:08 +00:00
João Valverde 51320ae59b wsutil: Improve UTF-8 APIs for debugging
In particular add an UTF-8 specific wslog API that should
make it easier to interpret invalid encodings.
2022-10-05 19:34:47 +01:00
João Valverde 9c4a42c07c wmem: Rename some variables
Use length and size consistently. strbuf->len does not
include the terminating nul. strbuf->alloc_len includes
the terminating nul.

Use consistent language and use "length" to mean size without
nul byte and "size" to mean size with all bytes, including nul.
2022-09-27 18:59:00 +01:00
João Valverde 6d06d4e46b Add some UTF-8 debug checks with a compile time flag
Some older dissectors that predate Unicode and parse text protocols
are prone to generate invalid UTF-8 strings. This is a bug and can have
safety implications.

For example passing invalid UTF-8 to proto_tree_add_string() is a
common bug. There are safeguards in format_text() but this should
not be relied on as a general solution to the problem.

For one, as the name implies, it is only used with representation of a
field value, which is not the same as the value itself of an FT_STRING field.
Issue #18317 shows another reason why.

For now this compile flag only enables extra checks for string ftypes,
which covers a subset of proto.h APIs including
proto_tree_append_string(). Later is should be extended to other
interfaces.

This is also not expected to be disabled for release builds because
there are still many dissectors that do not correctly handle strings.
More work is needed to 1) identify them and 2) fix them.

Ping #18317
2022-09-27 17:04:44 +00:00
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
Moshe Kaplan 1c3a9af869 Add files with WS_DLL_PUBLIC to Doxygen
Add @file markers for most files that
contain functions exported with
WS_DLL_PUBLIC so that Doxygen will
generate documentation for them.
2021-11-29 21:27:45 +00:00
João Valverde 925e01b23f Remove duplicate format_size() function
We have two format_size()s, with and without wmem scoped memory.
Move the wmem version to wsutil and add a convenience macro to
use g_malloc()ed memory.
2021-07-26 14:56:11 +00:00
João Valverde 7f9c1f5f92 Move wmem to wsutil
This allows wmem to be used from other libraries, namely wsutil.
It is often the case that a funtion exists in wsutil and cannot
be used with a wmem scope, requiring some code duplication or
extra memory allocations, or vice-versa, code in epan cannot be
moved to wsutil because it has a wmem dependency.

To this end wmem is moved to wsutil. Scope management remains part
of epan because those scope semantics are specific to dissection.
2021-07-26 14:56:11 +00:00