Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.
LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.
The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):
tcp.port#[2-4] == 1024
Matches layers 2 to 4 inclusive.
Fixes#3791.
The word range is used for different things with different
meanings and that is confusing. Avoid using "range" in code to
mean "slice".
A range is one or more intervals with a lower and upper bound.
A slice is a range applied to a bytes field.
Replace range with slice wherever appropriate. This usage of
"slice" instead of range is generally correct and consistent in
the documentation.
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):
Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01
Syntax tree:
0 TEST_LT:
1 OP_DIVIDE:
2 FUNCTION(abs#1):
3 OP_SUBTRACT:
4 FIELD(_ws.ftypes.double)
4 FVALUE(1 <FT_DOUBLE>)
2 FUNCTION(max#2):
3 FUNCTION(abs#1):
4 FIELD(_ws.ftypes.double)
3 FUNCTION(abs#1):
4 FVALUE(1 <FT_DOUBLE>)
1 FVALUE(0.01 <FT_DOUBLE>)
Instructions:
00000 READ_TREE _ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO 3
00002 SUBRACT reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH reg#2
00004 CALL_FUNCTION abs(reg#2) -> reg#0
00005 STACK_POP 1
00006 IF_FALSE_GOTO 24
00007 READ_TREE _ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO 9
00009 STACK_PUSH reg#1
00010 CALL_FUNCTION abs(reg#1) -> reg#4
00011 STACK_POP 1
00012 IF_FALSE_GOTO 13
00013 STACK_PUSH reg#4
00014 STACK_PUSH 1 <FT_DOUBLE>
00015 CALL_FUNCTION abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP 1
00017 IF_FALSE_GOTO 18
00018 STACK_PUSH reg#5
00019 CALL_FUNCTION max(reg#5, reg#4) -> reg#3
00020 STACK_POP 2
00021 IF_FALSE_GOTO 24
00022 DIVIDE reg#0 / reg#3 -> reg#6
00023 ANY_LT reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN
We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.
It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.
Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
Add location tracking as a column offset and length from offset
to the scanner. Our input is a single line only so we don't need
to track line offset.
Record that information in the syntax tree. Return the error location
in dfilter_compile(). Use it in dftest to mark the location of the
error in the filter string. Later it would be nice to use the location
in the GUI as well.
$ dftest "ip.proto == aaaaaa and tcp.port == 123"
Filter: ip.proto == aaaaaa and tcp.port == 123
dftest: "aaaaaa" cannot be found among the possible values for ip.proto.
ip.proto == aaaaaa and tcp.port == 123
^~~~~~
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
After some experimentation I don't think these two existence tests
belong in the grammar, it's an implementation detail and removing it
might avoid some artificial constraints.
Add argument to dfilter_compile_real() to save syntax tree text
representation.
Use it with dftest to print syntax tree.
Misc debug output format improvements.
Add support for display filter binary addition and subtraction.
The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.
We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.
Fixes#15504.
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.
Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.
The reference is synctatically validated at compile time.
There are two main advantages to this implementation (and a couple of
minor ones):
The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).
The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).
Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).
If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.
Fixes#17599.
This usage devalues a mechanism for warning users that deserves more
attention than this minor suggestion.
The warning is inconvenient for intermediate and advanced users.
This change implements a unary minus operator.
Filter: tcp.window_size_scalefactor == -tcp.dstport
Instructions:
00000 READ_TREE tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO 6
00002 READ_TREE tcp.dstport -> reg#1
00003 IF_FALSE_GOTO 6
00004 MK_MINUS -reg#1 -> reg#2
00005 ANY_EQ reg#0 == reg#2
00006 RETURN
It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.
Unary plus is implemented as a no-op. The plus sign is simply ignored.
Constant arithmetic expressions are computed during compilation.
Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.
Related to #15504.
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.
This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.
Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.
Fixes#17246.
For unparsed values on the RHS of a comparison try
to parse them first as a literal and only then as
a protocol. This is more complicated in code but
should be a use case a lot more common and useful in
practice.
It removes some annoying special cases and applies this
rule consistently to any expression. Consistency is
important otherwise the special cases and exceptions
make the language confusing and difficult to learn.
For values on the LHS the rule remains to first try a
protocol value, then a literal.
Related with issue #17731.
A literal value is a value that cannot be interpreted as a
registered protocol. An unparsed value can be a literal or
an identifier (protocol/field) according to context and the
current disambiguation rules.
Strictly literal here is to be understood to mean "numeric
literal, including numeric arrays, but not strings or character
constants".
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.
The symbol chosen for "all_eq" is "===".
Use that for error messages, including any using test operators.
This allows to always use the same name as the user. It avoids
cases where the user write "a && b" and the message is "a and b"
is syntactically invalid.
It should also allow us to be more consistent with the use of
double quotes.
Invalid character constants should be handled in the lexical scanner.
Todo: See if some code could be shared to parse double quoted strings.
It also fixes some unintuitive type coercions to string. Character
constants should be treated as characters, or maybe integers, or
maybe even throw an invalid comparison error, but coverting to a
literal string or byte array is surprising and not particularly
useful:
'\xFF' -> "'\xFF'" (equals)
'\xFF' -> "FF" (contains)
Before:
Filter: http.request.method contains "\x63"
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method contains '\x63'
Constants:
00000 PUT_FVALUE "63" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == "\x63"
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == '\x63'
Constants:
00000 PUT_FVALUE "'\\x63'" <FT_STRING> -> reg#1
(...)
After:
Filter: http.request.method contains '\x63'
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == '\x63'
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
This reverts commit d635ff4933.
A charconst cannot be a value string, for that reason it is not
redundant with unparsed.
Maybe character constants should be parsed in the lexical scanner
instead.
Before:
Filter: ip.proto == '\g'
dftest: "'\g'" cannot be found among the possible values for ip.proto.
After:
Filter: ip.proto == '\g'
dftest: "'\g'" isn't a valid character constant.
Add just a console entry for check_test(), in a more compact
form.
Remove logging of the call chain. This was partially replaced by the
printout of the syntax tree.
A charconst uses the same semantic rules as unparsed so just
use the latter to avoid redundancies.
We keep the use of TOKEN_CHARCONST as an optimization to avoid
an unnecessary name resolution (lookup for a registered field with
the same name as the charconst).
Currently unused. This might still be useful to differentiate
different spelling of the same token in user messages, like
"==" and "eq", but currently we are not storing test tokens
anyway, so just remove it, it makes everything simpler.
If it's ever necessary it can be added back.
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.
Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
Instead of using 3 operations (new + free + reassign_to_parent) to transform
the tree use a simpler single replace operation instead.
This also avoids having to manually copy token values.
The set search and replace method is now obsolete.
Clean up syntax error code. TEST and SET are never returned by
the tokenizer.
Remove unnecessary range_body() grammar element. Fix a comment.
Move the stnode_token_value() function to its proper place.
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.
Before:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity <0x7f6c84017740> of type STRING
After:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity 01:02:03:04 of type STRING
When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
Pass the deprecated data struture to the scanner and insert the deprecated
tokens there. This avoids having to keep a dedicated syntax node field
for this.
Pass the deprecated argument in dfwork_t instead of in a separate
argument. This is less cumbersome than adding an extra argument
to every level of the semantic checker.
Use wslog to output debug information. Being able to control
it at runtime is a big advantage.
We extend the syntax tree nodes with a method to return a
canonical string representation.
Add a routine to walk the tree and return an textual representation
for debugging purposes.