Commit Graph

190 Commits

Author SHA1 Message Date
João Valverde b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
João Valverde efbe699756 dfilter: Remove STTYPE_RANGE_NODE
STTYPE_RANGE_NODE is just a lexical token, it is
not used withi the syntax tree so remove it.
2022-06-25 16:06:48 +01:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde de103394fe dfilter: Make regex matches case insensitive by default 2022-06-08 12:17:22 +01:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde c0170dad42 dfilter: Rename "range" to "slice"
The word range is used for different things with different
meanings and that is confusing. Avoid using "range" in code to
mean "slice".

A range is one or more intervals with a lower and upper bound.

A slice is a range applied to a bytes field.

Replace range with slice wherever appropriate. This usage of
"slice" instead of range is generally correct and consistent in
the documentation.
2022-04-26 16:50:59 +00:00
João Valverde fab32ea0cb dfilter: Allow arithmetic expressions as function arguments
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):

Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01

Syntax tree:
 0 TEST_LT:
   1 OP_DIVIDE:
     2 FUNCTION(abs#1):
       3 OP_SUBTRACT:
         4 FIELD(_ws.ftypes.double)
         4 FVALUE(1 <FT_DOUBLE>)
     2 FUNCTION(max#2):
       3 FUNCTION(abs#1):
         4 FIELD(_ws.ftypes.double)
       3 FUNCTION(abs#1):
         4 FVALUE(1 <FT_DOUBLE>)
   1 FVALUE(0.01 <FT_DOUBLE>)

Instructions:
00000 READ_TREE		_ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO	3
00002 SUBRACT		reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH	reg#2
00004 CALL_FUNCTION	abs(reg#2) -> reg#0
00005 STACK_POP	1
00006 IF_FALSE_GOTO	24
00007 READ_TREE		_ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO	9
00009 STACK_PUSH	reg#1
00010 CALL_FUNCTION	abs(reg#1) -> reg#4
00011 STACK_POP	1
00012 IF_FALSE_GOTO	13
00013 STACK_PUSH	reg#4
00014 STACK_PUSH	1 <FT_DOUBLE>
00015 CALL_FUNCTION	abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP	1
00017 IF_FALSE_GOTO	18
00018 STACK_PUSH	reg#5
00019 CALL_FUNCTION	max(reg#5, reg#4) -> reg#3
00020 STACK_POP	2
00021 IF_FALSE_GOTO	24
00022 DIVIDE		reg#0 / reg#3 -> reg#6
00023 ANY_LT		reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN

We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.

It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
2022-04-18 17:10:31 +01:00
João Valverde cb2f085f14 dfilter: Add max() and min() functions
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.

Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
2022-04-14 13:07:41 +00:00
João Valverde 4d9470e7dd dfilter: Add location tracking to scanner and use it to report errors
Add location tracking as a column offset and length from offset
to the scanner. Our input is a single line only so we don't need
to track line offset.

Record that information in the syntax tree. Return the error location
in dfilter_compile(). Use it in dftest to mark the location of the
error in the filter string. Later it would be nice to use the location
in the GUI as well.

$ dftest "ip.proto == aaaaaa and tcp.port == 123"
Filter: ip.proto == aaaaaa and tcp.port == 123
dftest: "aaaaaa" cannot be found among the possible values for ip.proto.
	ip.proto == aaaaaa and tcp.port == 123
	            ^~~~~~
2022-04-10 10:09:51 +01:00
João Valverde da19379eb5 dfilter: Create the syntax node in the scanner and pass that
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
2022-04-10 09:54:03 +01:00
João Valverde fb9a176587 dfilter: Allow grouping arithmetical expressions with { }
This removes the limitation of having only two terms in an
arithmetic expression and allows setting the precedence using
curly braces (like any basic calculator).

Our grammar currently does not allow grouping arithmetic expressions
using parenthesis, because boolean expressions and arithmetic
expressions are different and parenthesis are used with the former.
2022-04-08 23:12:04 +01:00
João Valverde 0313cd02bc dfilter: Fix RHS bias for literal values
Fixes a3b76138f0.
2022-04-06 23:46:22 +01:00
João Valverde 20afbd46ec dfilter: Remove existence test syntax tree nodes
After some experimentation I don't think these two existence tests
belong in the grammar, it's an implementation detail and removing it
might avoid some artificial constraints.
2022-04-05 12:04:37 +01:00
João Valverde fb08c4b4a8 dfilter: Replace bitwise sttype with arithmetic
Most of the bitwise codepaths are just duplicating code for
the arithmetic type. Parse bitwise expressions as arithmetic
instead.
2022-04-05 12:04:37 +01:00
João Valverde f0ca30b60b dfilter: More arithmetic fixes
Fix a failed assertion with constant arithmetic expressions.

Because we do not parse constants on the lexical level it is
more complicated to handle constant expressions with unparsed
values.

We need to handle missing type information gracefully for any
kind of arithmetic expression, not just unary minus.
2022-04-02 18:10:33 +00:00
João Valverde 67e5e5c3ab dfilter: Fix arithmetic expressions on the LHS
Filter: _ws.ftypes.framenum % 3 == 0

Instructions:
00000 READ_TREE		_ws.ftypes.framenum -> reg#0
00001 IF_FALSE_GOTO	4
00002 MODULO		reg#0 % 3 <FT_FRAMENUM> -> reg#1
00003 ANY_EQ		reg#1 == 0 <FT_FRAMENUM>
00004 RETURN
2022-04-01 14:33:38 +01:00
João Valverde 8bc214b5bb dfilter: Add remaining arithmetic integer ops 2022-03-31 16:49:42 +01:00
João Valverde 2a9cb588aa dfilter: Add binary arithmetic (add/subtract)
Add support for display filter binary addition and subtraction.

The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.

We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.

Fixes #15504.
2022-03-31 11:27:34 +01:00
João Valverde 260942e170 dfilter: Refactor macro tree references
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.

Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.

The reference is synctatically validated at compile time.

There are two main advantages to this implementation (and a couple of
minor ones):

The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).

The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).

Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).

If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.

Fixes #17599.
2022-03-29 12:36:31 +00:00
João Valverde 431cb43b81 dfilter: Remove parenthesis deprecation warning
This usage devalues a mechanism for warning users that deserves more
attention than this minor suggestion.

The warning is inconvenient for intermediate and advanced users.
2022-03-29 12:19:26 +00:00
João Valverde a1299d63d9 dfilter: Lower level of two debug messages 2022-03-28 17:20:00 +01:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde a3b76138f0 dfilter: Fix memory leak
Filter: tcp.srcport == udp.port

Instructions:
00000 READ_TREE		tcp.srcport -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_TREE		udp.port -> reg#1
00003 IF_FALSE_GOTO	5
00004 ANY_EQ		reg#0 == reg#1
00005 RETURN

=================================================================
==180444==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 34 byte(s) in 1 object(s) allocated from:
    #0 0x55f21e4a9ff9  (/home/jpv/projects/wireshark/wireshark/build-asan/run/dftest+0xcdff9)
    #1 0x7f95ea661338  (/usr/lib/libc.so.6+0x82338)

SUMMARY: AddressSanitizer: 34 byte(s) leaked in 1 allocation(s).

Fixes a68b408a9f.
2022-03-25 18:38:11 +00:00
João Valverde 2fc8c0e36b dfilter: Handle a bitwise expr on the RHS 2022-03-23 11:04:41 +00:00
João Valverde 0335ebdc3a dfilter: ftype_is_true -> ftype_is_zero 2022-03-23 11:04:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde bd48f947b0 dfilter: Require a field-like value on the LHS
Comparisons require a field-like value on one of the sides,
or both. Change this to require on the LHS or both. There is
realy no reason that I can see to allow the relation to commute,
and it allows removing a lot of unnecessary code and extra tests.
2022-03-05 11:10:54 +00:00
João Valverde a68b408a9f dfilter: Add RHS bias for literal values
For unparsed values on the RHS of a comparison try
to parse them first as a literal and only then as
a protocol. This is more complicated in code but
should be a use case a lot more common and useful in
practice.

It removes some annoying special cases and applies this
rule consistently to any expression. Consistency is
important otherwise the special cases and exceptions
make the language confusing and difficult to learn.

For values on the LHS the rule remains to first try a
protocol value, then a literal.

Related with issue #17731.
2022-03-05 11:10:54 +00:00
João Valverde c4f9d8abda dfilter: Rename "unparsed" to "literal"
A literal value is a value that cannot be interpreted as a
registered protocol. An unparsed value can be a literal or
an identifier (protocol/field) according to context and the
current disambiguation rules.

Strictly literal here is to be understood to  mean "numeric
literal, including numeric arrays, but not strings or character
constants".
2022-03-05 11:10:54 +00:00
João Valverde 1278e36152 dfilter: Add more debug code 2022-02-27 23:35:57 +00:00
João Valverde 8b23dd3a3c dfilter: Add an "all equal" operator
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.

The symbol chosen for "all_eq" is "===".
2021-12-22 14:32:32 +00:00
João Valverde 557cee31fc dfilter: Save lexical token value to syntax tree
Use that for error messages, including any using test operators.

This allows to always use the same name as the user. It avoids
cases where the user write "a && b" and the message is "a and b"
is syntactically invalid.

It should also allow us to be more consistent with the use of
double quotes.
2021-12-01 13:34:01 +00:00
João Valverde a6f978b4d3 dfilter: Remove two stnode replacement functions
One is unused and the other is only used with a corner
case. They are probably not necessary otherwise.
2021-11-30 19:48:47 +00:00
João Valverde fbfb4959ae dfilter: Better representation for charconst 2021-11-27 18:38:22 +00:00
João Valverde 352390aa97 dfilter: Need to handle a charconst on the LHS 2021-11-27 17:19:11 +00:00
João Valverde 943c282009 dfilter: Parse character constants in lexer
Invalid character constants should be handled in the lexical scanner.

Todo: See if some code could be shared to parse double quoted strings.

It also fixes some unintuitive type coercions to string. Character
constants should be treated as characters, or maybe integers, or
maybe even throw an invalid comparison error, but coverting to a
literal string or byte array is surprising and not particularly
useful:
  '\xFF' -> "'\xFF'" (equals)
  '\xFF' -> "FF"     (contains)

Before:

    Filter: http.request.method contains "\x63"

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method contains '\x63'

    Constants:
    00000 PUT_FVALUE	"63" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == "\x63"

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == '\x63'

    Constants:
    00000 PUT_FVALUE	"'\\x63'" <FT_STRING> -> reg#1
    (...)

After:

    Filter: http.request.method contains '\x63'

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == '\x63'

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)
2021-11-24 08:40:20 +00:00
João Valverde 7028646f9e dfilter: Fix invalid character constant error message
This reverts commit d635ff4933.

A charconst cannot be a value string, for that reason it is not
redundant with unparsed.

Maybe character constants should be parsed in the lexical scanner
instead.

Before:
  Filter: ip.proto == '\g'
  dftest: "'\g'" cannot be found among the possible values for ip.proto.

After:
  Filter: ip.proto == '\g'
  dftest: "'\g'" isn't a valid character constant.
2021-11-23 17:35:40 +00:00
João Valverde 75bb51eef9 dfilter: Clean up some debug statements, second try
Add just a console entry for check_test(), in a more compact
form.

Remove logging of the call chain. This was partially replaced by the
printout of the syntax tree.
2021-11-16 11:27:04 +00:00
João Valverde c4337d0dc5 dfilter: Give more context for regex error messages 2021-11-16 11:18:09 +00:00
João Valverde 848f4f8e97 dfilter: Cleanup some debug statements
Reduce the verbosity a bit, even with "noisy" level,
and remove some extraneous new lines.
2021-11-14 23:22:42 +00:00
João Valverde 274531820a Move regex code to wsutil 2021-11-14 21:00:59 +00:00
João Valverde 01d1cc492e dfilter: Add default case to switch 2021-11-13 11:39:32 +00:00
João Valverde 1a32a75a62 ftypes: Internal headers need to be internal
The header ftypes-int.h should not be used outside of epan/ftypes
because it is a private header.

The functions fvalue_free() and fvalue_cleanup() need not and should
not be macros either.
2021-11-11 03:15:31 +00:00
João Valverde b62d4b8eca dfilter: Change string node display representation again
Adding double quotes to the display output format was probably a mistake.
2021-11-10 03:19:24 +00:00
João Valverde e7ecc9b9e5 dfilter: Clean up error format and exception code
Misc code cleanups. Add some extra stnode functions for increased type
safety. Fix a constness issue with df_lval_value().
2021-11-10 03:18:50 +00:00
João Valverde 63adcf7fb5 dfilter: Clean up function parameters semantic check 2021-11-10 02:12:06 +00:00
João Valverde d0a07881f4 dfilter: Remove unnecessary node conversion
This has never worked AFAICT because functions have never accepted
fvalues, only fields, so converting from unparsed to an fvalue
string is moot, they are both invalid argument types to functions.
2021-11-10 01:23:49 +00:00
João Valverde ac431ec855 dfilter: Remove some debug statements
Normalize the amount of debug code in the module.
2021-11-10 01:23:49 +00:00
João Valverde e965fa32a1 dfilter: Refactor some semantic check code
Try to reorganize the check_relation sub-functions for better
clarity and consistency and less duplication.
2021-11-10 01:23:29 +00:00