Cleanup flex code. Optimize some patterns to avoid lookups
for field matches for values that are not legal field names.
Improve warning and add some comments.
We use a common idiom ( https://stackoverflow.com/a/39606065https://gist.github.com/hynekcer/1b0a260ef72dae05fe9611904d7b9675 )
for getting the results of our unittest.TestCases in the tearDown
method. This method accesses a private property, and that private
property was removed in Python 3.11
The StackOverflow answer has been updated with a new approach for
Python 3.11, which also uses a private property. This fixes things
for CTest (and the test target when building), but it still doesn't
work when using pytest's unittest support.
Ping #18740
No one is using this so I'd like to explore other
options first to handle constants in arithmetic
expressions that lack type information.
Reverts 3ddb017a88.
Make dfilter byte representation always use ':' for consistency.
Make 1 byte be represented as "XX:" with the colon suffix to
make it nonambiguous that is is a byte and not other type,
like a protocol.
The difference is can be seen in the following programs. In the
before representation it is not obvious at all that the second
"fc" value is a literal bytes value and not the value of the
protocol "fc", although it can be inferred from the lack of
a READ_TREE instruction. In the After we know that "fc:" must
be bytes and not a protocol.
Note that a leading colon is a syntactical expedient to say
"this value with any type is a literal value and not a protocol
field." A terminating colon is just a part of the dfilter
literal bytes syntax.
Before:
Filter: fc == :fc
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(fc <FT_PROTOCOL>)
1 FVALUE(fc <FT_PROTOCOL>)
Instructions:
00000 READ_TREE fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == fc <FT_PROTOCOL>
After:
Filter: fc == :fc
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(fc <FT_PROTOCOL>)
1 FVALUE(fc: <FT_PROTOCOL>)
Instructions:
00000 READ_TREE fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == fc: <FT_PROTOCOL>
Expressions that start with hyphen clash with command-line options.
In that case we need to pass "--" to dftest to stop processing
options.
Fix the test suite to do this. Fixes failures with dftest and
expressions like:
-2 == tcp.port
Replace the GLib option parser with getopt while at it. The GLib API
is nice but isn't a good fit for this utility and the code appears to
be inconsistent on whether "--" is left in the argv or not.
Remove unparsed lexical type and replace it with identifier
and constant. This separation is still necessary to differentiate
names (fields and function) from literals that look like names
but it has some advantages to do it at the lexical level.
The main advantage is a much cleaner and simplified grammar,
because we only have a single token type for field names, without
any loss of generality (the same name is valid for fields and
function names for example).
The CONSTANT token type is necessary to be different from literal
to provide errors for function rules.
The strategy here is to delay resolving literals to values until
we have looked at the entire argument list.
Also we will try to commute the relation in a comparison if
we do not have a type for the return value of the function,
like any other constant.
Before:
Filter: max(1,_ws.ftypes.int8) == 1
dftest: Argument '1' is not valid for max()
max(1,_ws.ftypes.int8) == 1
^
After:
Filter: max(1,_ws.ftypes.int8) == 1
Syntax tree:
0 TEST_ANY_EQ:
1 FUNCTION(max#2):
2 FVALUE(1 <FT_INT8>)
2 FIELD(_ws.ftypes.int8 <FT_INT8>)
1 FVALUE(1 <FT_INT8>)
Instructions:
00000 STACK_PUSH 1 <FT_INT8>
00001 READ_TREE _ws.ftypes.int8 <FT_INT8> -> reg#1
00002 IF_FALSE_GOTO 3
00003 STACK_PUSH reg#1
00004 CALL_FUNCTION max(reg#1, 1 <FT_INT8>) -> reg#0
00005 STACK_POP 2
00006 IF_FALSE_GOTO 8
00007 ANY_EQ reg#0 == 1 <FT_INT8>
00008 RETURN
Filter: max(1,_ws.ftypes.int8) == 1
** (dftest:64938) 01:43:25.950180 [DFilter ERROR] epan/dfilter/sttype-field.c:117 -- sttype_field_ftenum(): Magic num is 0x5cf30031, but should be 0xfc2002cf
Comparison relations should be allowed to commute but they can not
because we need type information to resolve literals to fvalues. For
that reason an expression like "1 == some.field" is invalid. Solve
that by commuting the relation if the first try did not succeed in
assigning a type to the LHS.
After the second try give up, that means we have a relation with
constants on both sides and that is not semantically valid.
Other relations like "matches" and "contains" are not symmetric and
should not commute anyway.
Before:
Filter: _ws.ftypes.int32 == 10
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.int32 <FT_INT32>)
1 FVALUE(10 <FT_INT32>)
Instructions:
00000 READ_TREE _ws.ftypes.int32 <FT_INT32> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 10 <FT_INT32>
00003 RETURN
Filter: 10 == _ws.ftypes.int32
dftest: Left side of "==" expression must be a field or function, not 10.
10 == _ws.ftypes.int32
^~
After:
Filter: _ws.ftypes.int32 == 10
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.int32 <FT_INT32>)
1 FVALUE(10 <FT_INT32>)
Instructions:
00000 READ_TREE _ws.ftypes.int32 <FT_INT32> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 10 <FT_INT32>
00003 RETURN
Filter: 10 == _ws.ftypes.int32
Syntax tree:
0 TEST_ANY_EQ:
1 FVALUE(10 <FT_INT32>)
1 FIELD(_ws.ftypes.int32 <FT_INT32>)
Instructions:
00000 READ_TREE _ws.ftypes.int32 <FT_INT32> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ 10 <FT_INT32> == reg#0
00003 RETURN
The expected test output is with the headers decompressed, which
we can't do without Nghttp2. (It outputs the compressed headers
if we don't have it, so we could test for that instead.)
Fix#18707
Extends raw adressing syntax to wok with references. The syntax
is
@field1 == ${@field2}
This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.
The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:
@http.user_agent == "Mozill\xAA"
Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.
Considering the following programs:
$ dftest '_ws.ftypes.string == "ABC"'
Filter: _ws.ftypes.string == "ABC"
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.string <FT_STRING>)
1 FVALUE("ABC" <FT_STRING>)
Instructions:
00000 READ_TREE _ws.ftypes.string <FT_STRING> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == "ABC" <FT_STRING>
00003 RETURN
$ dftest '@_ws.ftypes.string == "ABC"'
Filter: @_ws.ftypes.string == "ABC"
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.string <RAW>)
1 FVALUE(41:42:43 <FT_BYTES>)
Instructions:
00000 READ_TREE @_ws.ftypes.string <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 41:42:43 <FT_BYTES>
00003 RETURN
In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
The poll and precision fields in timing NTP messages are signed
integers.
Different NTP implementations have different minimum and maximum polling
intervals. Some can be configured even with negative values for
sub-second intervals (e.g. down to -7 for 1/128th of a second).
NTP clocks on modern systems and hardware typically have
a sub-microsecond precision.
Print all poll values. Add the raw precision and change the resolution
of the printed value to nanoseconds.
The proto.h APIs expect valid UTF-8 so replace uses of format_text()
with a label copy function that just does formatting and does not
check for encoding errors. Avoid multiple levels of temporary
string allocations.
Make sure the copy does not truncate a multibyte character and
produce invalid strings. Add debug checks for UTF-8 encoding errors
instead.
We escape C0 and C1 control codes (because control codes)
and ASCII whitespace (and bell).
Overall the goal is to be more efficient and optimized and help
detect misuse of APIs by passing invalid UTF-8.
Add a unit test for ws_label_strcat.
This removes the last dependency of the logging subsystem on the
preferences module. The latter is started much later than the former
and this is an issue.
The Windows-only preference "gui.console_open" is stored in the
registry as HKEY_LOCAL_USER\Software\Wireshark\ConsoleOpen. The semantics
are exactly the same. The preference is read by the logging subsystem
for initialization and then again by the preferences (read/write) so
the user can configure it as before.
The code to store the preference also in the preferences file was
kept, for backward compatibility and because it is not incompatible
with using the Registry concurrently.
The elimination of the prefs dependency also allows moving the Windows
console logic to wsutil and add the functionality to wslog directly,
thereby eliminating the superfluous Wireshark/Logray custom log handler.
To be able to read the ws_log_console_open global variable from
libwireshark it becomes necessary to add a new export macro
symbol called WSUTIL_EXPORT.
We amend the :<numeric> pattern to not eat the leading
colon. Because the colon can be part of the value (with IPv6 addresses
for example) we want to avoid doing that.
IPv6 addresses are covered by their own rules but this removes the
requirement in the future to handle any special cases and avoids
surprises.
For this reason the colon-prefix syntax is already explicitly defined to
work only for byte arrays and there is currently no universal
syntax for all literal values or even all numbers.
Other numbers can keep using the lexical type "unparsed".
```
run/dftest "_ws.ftypes.uint8 == :fd"
Filter: _ws.ftypes.uint8 == :fd
dftest: ":fd" is not a valid number.
_ws.ftypes.uint8 == :fd
^~~
run/dftest "_ws.ftypes.uint8 == fd"
Filter: _ws.ftypes.uint8 == fd
dftest: "fd" is not a valid number.
_ws.ftypes.uint8 == fd
^~
run/dftest "_ws.ftypes.uint8 == 0xfd"
Filter: _ws.ftypes.uint8 == 0xfd
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.uint8 <FT_UINT8>)
1 FVALUE(253 <FT_UINT8>)
Instructions:
00000 READ_TREE _ws.ftypes.uint8 <FT_UINT8> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 253 <FT_UINT8>
00003 RETURN
run/dftest "_ws.ftypes.bytes == fd"
Filter: _ws.ftypes.bytes == fd
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
1 FVALUE(fd <FT_BYTES>)
Instructions:
00000 READ_TREE _ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == fd <FT_BYTES>
00003 RETURN
run/dftest "_ws.ftypes.bytes == :fd"
Filter: _ws.ftypes.bytes == :fd
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
1 FVALUE(fd <FT_BYTES>)
Instructions:
00000 READ_TREE _ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == fd <FT_BYTES>
00003 RETURN
```
The <...> syntax for literals, intended to be as generic as
possible, unintentionally introduced an ambiguity with the
relational expression "a < b or a > c".
Literals are values like numbers, bytes, IPv6 addresses or, one
could imagine, UNC paths for example, if an FT_UNC type were to
be added in the future.
We could use a new unique symbol like @...@ but the <...>
syntax is very recent and may not be necessary with ":xxx" so
just remove it.
A byte array can be explicitly declared by prefixing with a colon. It
is not as generic but the main ambiguity that this new syntax attempted
to solve is bytes vs protocol names. We don't want to introduce a new
reserved symbol for now, until other requirements if any are more clear.
Fixes#18418.
When reassemble_ooo is set, we can do a better job determining
if a segment is retransmitted or not from the reassembler perspective,
which is different than what TCP sequence analysis determines, which
is retransmission from the sender's perspective.
This also allows us to have a good way of dealing with retransmission
but with additional data.
This only works when reassembling out of order is set. Without it,
we fall back to the old method of detecting retransmissions, which
has a harder time with the edge cases.
Fix#17406, fix#15993, fix#13388, fix#13523.
Modernize the handling of experimental TCP options based on
RFC 6994. In particular use ExID instead of magic (which
in the context of RFC 6994 are the last two bytes of a
32-bit ExID) and add a desciption of ExID based on the
current state of the IANA registry.
Field blocks (carried in HEADERS, PUSH_PROMISE, and CONTINUATION
frames) are compressed by HPACK. Send them to the follow tap only
after decompression. Update the tests to match the new output.
Ping #18239 (There's still the case of gzip and brotli compressed
DATA frames to handle).
Implement out of order buffering and desegmentation for QUIC
CRYPTO frames. Particularly useful for Chrome's "Chaos Protection"
that intentionally introduces them, but handles out of order
CRYPTO frames in different UDP payloads as well. (Buffering
packets at a higher encryption level until the out of order
lower level frames have arrived is a different issue.)
Adds a preference, which defaults to on since if there is
out of order, it's not very useful to turn it off.
Fix#17732. Fix#18215.
Allow checking if a slice exists. The result is true if the
slice has length greater than zero.
The len() function is implemented as a DFVM instruction instead.
The semantics are the same.
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.
We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).
This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.
Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
This adds support for using the layers filter
with field references.
Before:
$ dftest 'ip.src != ${ip.src#2}'
dftest: invalid character in macro name
After:
$ dftest 'ip.src != ${ip.src#2}'
Filter: ip.src != ${ip.src#2}
Syntax tree:
0 TEST_ALL_NE:
1 FIELD(ip.src <FT_IPv4>)
1 REFERENCE(ip.src#[2:1] <FT_IPv4>)
Instructions:
00000 READ_TREE ip.src <FT_IPv4> -> reg#0
00001 IF_FALSE_GOTO 5
00002 READ_REFERENCE_R ${ip.src <FT_IPv4>} #[2:1] -> reg#1
00003 IF_FALSE_GOTO 5
00004 ALL_NE reg#0 != reg#1
00005 RETURN
This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.
The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.
Range sttype is renamed to slice for clarity.
Use "True" or "TRUE" instead of "true" and remove case insensivity.
Same for false. This should serve to differentiate booleans a bit
more from protocol names, which should be using lower-case.
Add the de facto standard Lua regex API to Wireshark. Upstream
code is copied verbatim and the module opened in the "rex" table.
This is just a user convenience and developer quality of life improvement
over the GRegex Lua API because it has always been possible to
load lrexlib-pcre2 as a Lua module from Wireshark.
This code has been unmaintained and does not pass the lrexlib test
suite. GRegex itself has been obsolescent for some time, although GNOME
has recently restarted trying to move it to PCRE2.
Remove it in preparation for a move to lrexlib-pcre2.