In many grammatical contexts fields are only tested for existence
instead of loading the values into a register, because that's all
that is needed to determine if a filter passes or not. Add a
dfilter option to load the field values from the tree and return
them when a field (including field at a certain protocol layer) is
the root of the filter syntax tree.
This is useful for columns, especially for parsing columns defined
with the layer operator, but it can't completely replace the current
custom column handling because we don't yet return exactly which
hfinfo was present, if more than one has the same abbreviation, and
it's possible for fields with the same abbreviation to have different
strings, and hence different "resolved" values.
$ ./run/dftest -s "@ip.proto#1"
Filter:
@ip.proto#1
Syntax tree:
0 FIELD(@ip.proto#[1:1] <FT_BYTES>)
Instructions:
0000 CHECK_EXISTS_R ip.proto#[1:1]
0001 RETURN
$ ./run/dftest -s "@ip.proto#1" --return-vals
Filter:
@ip.proto#1
Syntax tree:
0 FIELD(@ip.proto#[1:1] <FT_BYTES>)
Instructions:
0000 READ_TREE_R @ip.proto#[1:1] -> R0
0001 NO_OP
0002 RETURN R0
Related to #18588
When generating DVFM code, tell the return function what
register has the final set of fvalues for filters that are
functions, arithmetic, or slices (that is, that compare one
or more fvalues to see if they are all zero.) Make sure
that these functions return an empty ptr array, unlike
tests that return a null ptr array.
For fields, we could return the fvalues, but currently we
don't bother loading the fvalues into registers since display
filters that just have a field test existence, so the generated
code would have to change. It's also a little more complicated
because there can be multiple fields that have different types
(sometimes not commensurable, which is an error noted by some of
the checks.) The logic in custom columns handles the field cases
currently.
If a branch instruction does not branch, i.e it jumps
to the next instruction, replace it with a no-op for
a slight performance optimization and decluttering
of the bytecode.
When the jumps_ptr is NULL a nested function call
results in a NULL pointer dereference. We could add
a NULL check but removing the jump in commit e85f8d4cf1
was a mitake, because the jump is not always a no-op,
so add it back.
Fixes e85f8d4cf1.
Remove name static storage and use a switch to map enums to
names. This allows mapping names all names, even those that
are not instantiated as objects.
Rewrite some assertions using ws_error() for detailed error
messages.
/builds/wireshark/wireshark/epan/dfilter/gencode.c: In function 'gen_arithmetic':
/builds/wireshark/wireshark/epan/dfilter/gencode.c:538:17: error: 'op' may be used uninitialized [-Werror=maybe-uninitialized]
538 | gen_relation_insn(dfw, op, val1, reg_val, NULL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/builds/wireshark/wireshark/epan/dfilter/gencode.c:501:25: note: 'op' was declared here
501 | dfvm_opcode_t op;
| ^~
cc1: all warnings being treated as errors
/builds/wireshark/wireshark/epan/dfilter/semcheck.c: In function 'dfilter_fvalue_from_number':
/builds/wireshark/wireshark/epan/dfilter/semcheck.c:340:17: error: 'fv' may be used uninitialized [-Werror=maybe-uninitialized]
340 | stnode_replace(st, STTYPE_FVALUE, fv);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/builds/wireshark/wireshark/epan/dfilter/semcheck.c:297:19: note: 'fv' was declared here
297 | fvalue_t *fv;
| ^~
cc1: all warnings being treated as errors
BASE_CUSTOM is not a value string so it doesn not have a catalog
of values but for implementation purposes we can treat it as a
value string that cannot be uniquely mapped to an integer.
This way fields using BASE_CUSTOM can also be matched using the
custom string representation.
This is probably less useful thani normal value strings but still
it is a use case we can support with the current compiler.
Not all value strings or other maps that can be treated like
a value string can be inverted to a single number. Sometimes
they might represent a range, such as with range_strings,
or more than one number maps to the same string, so the
function is not a bijection.
To be able to handle those cases we add an instruction to
transform the field value to the string at runtime and use
string comparisons to match the value string.
This way all string maps can be handled by the display filter
engine.
TFS() strings are always bijections so that case is unchanged;
we map the string to a boolean at compile time.
Fixes#19351.
"A not in S" is now implemented as "A and A not_in S"
instead of "not (A in S)".
"not A in S" is implemented as "not A or A not_in S".
This is to be consistent with the way inequality has historically
worked, where "A != B" is not the same as "not A == B".
Maybe we should change both propositions to have inequality
be the same as not equality instead.
Fixes#19187.
Fix the "all X in S" expression to be implemented as
(x1 in S) AND (x2 in S) AND ... AND (xn in S)
Previously it was implemented as
(X all_eq s1) OR (X all_eq s2) OR ... OR (X all_eq sn)
which does not implement set membership semantics correctly.
The implementation uses a list to build the set and the
set membership test is done with a SET_*_IN instruction
that tests if a register belongs to the set (list contents).
Example:
Filter:
all tcp.port in {10..15,20,30}
Instructions:
0000 READ_TREE tcp.port -> R0
0001 IF_FALSE_GOTO 7
0002 SET_ADD_RANGE 10 .. 15
0003 SET_ADD 20
0004 SET_ADD 30
0005 SET_ALL_IN R0
0006 SET_CLEAR
0007 RETURN
Fixes #19188.
Extends raw adressing syntax to wok with references. The syntax
is
@field1 == ${@field2}
This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.
The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:
@http.user_agent == "Mozill\xAA"
Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.
Considering the following programs:
$ dftest '_ws.ftypes.string == "ABC"'
Filter: _ws.ftypes.string == "ABC"
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.string <FT_STRING>)
1 FVALUE("ABC" <FT_STRING>)
Instructions:
00000 READ_TREE _ws.ftypes.string <FT_STRING> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == "ABC" <FT_STRING>
00003 RETURN
$ dftest '@_ws.ftypes.string == "ABC"'
Filter: @_ws.ftypes.string == "ABC"
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.string <RAW>)
1 FVALUE(41:42:43 <FT_BYTES>)
Instructions:
00000 READ_TREE @_ws.ftypes.string <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 41:42:43 <FT_BYTES>
00003 RETURN
In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
Allow checking if a slice exists. The result is true if the
slice has length greater than zero.
The len() function is implemented as a DFVM instruction instead.
The semantics are the same.
This adds support for using the layers filter
with field references.
Before:
$ dftest 'ip.src != ${ip.src#2}'
dftest: invalid character in macro name
After:
$ dftest 'ip.src != ${ip.src#2}'
Filter: ip.src != ${ip.src#2}
Syntax tree:
0 TEST_ALL_NE:
1 FIELD(ip.src <FT_IPv4>)
1 REFERENCE(ip.src#[2:1] <FT_IPv4>)
Instructions:
00000 READ_TREE ip.src <FT_IPv4> -> reg#0
00001 IF_FALSE_GOTO 5
00002 READ_REFERENCE_R ${ip.src <FT_IPv4>} #[2:1] -> reg#1
00003 IF_FALSE_GOTO 5
00004 ALL_NE reg#0 != reg#1
00005 RETURN
This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.
The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.
Range sttype is renamed to slice for clarity.
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.
LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.
The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):
tcp.port#[2-4] == 1024
Matches layers 2 to 4 inclusive.
Fixes#3791.
The word range is used for different things with different
meanings and that is confusing. Avoid using "range" in code to
mean "slice".
A range is one or more intervals with a lower and upper bound.
A slice is a range applied to a bytes field.
Replace range with slice wherever appropriate. This usage of
"slice" instead of range is generally correct and consistent in
the documentation.
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):
Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01
Syntax tree:
0 TEST_LT:
1 OP_DIVIDE:
2 FUNCTION(abs#1):
3 OP_SUBTRACT:
4 FIELD(_ws.ftypes.double)
4 FVALUE(1 <FT_DOUBLE>)
2 FUNCTION(max#2):
3 FUNCTION(abs#1):
4 FIELD(_ws.ftypes.double)
3 FUNCTION(abs#1):
4 FVALUE(1 <FT_DOUBLE>)
1 FVALUE(0.01 <FT_DOUBLE>)
Instructions:
00000 READ_TREE _ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO 3
00002 SUBRACT reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH reg#2
00004 CALL_FUNCTION abs(reg#2) -> reg#0
00005 STACK_POP 1
00006 IF_FALSE_GOTO 24
00007 READ_TREE _ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO 9
00009 STACK_PUSH reg#1
00010 CALL_FUNCTION abs(reg#1) -> reg#4
00011 STACK_POP 1
00012 IF_FALSE_GOTO 13
00013 STACK_PUSH reg#4
00014 STACK_PUSH 1 <FT_DOUBLE>
00015 CALL_FUNCTION abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP 1
00017 IF_FALSE_GOTO 18
00018 STACK_PUSH reg#5
00019 CALL_FUNCTION max(reg#5, reg#4) -> reg#3
00020 STACK_POP 2
00021 IF_FALSE_GOTO 24
00022 DIVIDE reg#0 / reg#3 -> reg#6
00023 ANY_LT reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN
We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.
It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.
Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
After some experimentation I don't think these two existence tests
belong in the grammar, it's an implementation detail and removing it
might avoid some artificial constraints.
Add support for display filter binary addition and subtraction.
The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.
We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.
Fixes#15504.
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.
Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.
The reference is synctatically validated at compile time.
There are two main advantages to this implementation (and a couple of
minor ones):
The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).
The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).
Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).
If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.
Fixes#17599.
This change implements a unary minus operator.
Filter: tcp.window_size_scalefactor == -tcp.dstport
Instructions:
00000 READ_TREE tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO 6
00002 READ_TREE tcp.dstport -> reg#1
00003 IF_FALSE_GOTO 6
00004 MK_MINUS -reg#1 -> reg#2
00005 ANY_EQ reg#0 == reg#2
00006 RETURN
It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.
Unary plus is implemented as a no-op. The plus sign is simply ignored.
Constant arithmetic expressions are computed during compilation.
Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.
Related to #15504.
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.
This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.
Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.
Fixes#17246.
The DFVM instructions arguments are generic boxed types but instead
of using FVALUE and PCRE types the code passes aroung REGISTER types
instead. Change that to pass constants in the instruction.
Use a list to allow a variable number of jumps, instead of a fixed
count. The flexibility in the number of jumps a given syntax tree
node might need to handle is useful to add new kinds of
operations.