Commit Graph

78 Commits

Author SHA1 Message Date
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde c0170dad42 dfilter: Rename "range" to "slice"
The word range is used for different things with different
meanings and that is confusing. Avoid using "range" in code to
mean "slice".

A range is one or more intervals with a lower and upper bound.

A slice is a range applied to a bytes field.

Replace range with slice wherever appropriate. This usage of
"slice" instead of range is generally correct and consistent in
the documentation.
2022-04-26 16:50:59 +00:00
João Valverde fab32ea0cb dfilter: Allow arithmetic expressions as function arguments
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):

Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01

Syntax tree:
 0 TEST_LT:
   1 OP_DIVIDE:
     2 FUNCTION(abs#1):
       3 OP_SUBTRACT:
         4 FIELD(_ws.ftypes.double)
         4 FVALUE(1 <FT_DOUBLE>)
     2 FUNCTION(max#2):
       3 FUNCTION(abs#1):
         4 FIELD(_ws.ftypes.double)
       3 FUNCTION(abs#1):
         4 FVALUE(1 <FT_DOUBLE>)
   1 FVALUE(0.01 <FT_DOUBLE>)

Instructions:
00000 READ_TREE		_ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO	3
00002 SUBRACT		reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH	reg#2
00004 CALL_FUNCTION	abs(reg#2) -> reg#0
00005 STACK_POP	1
00006 IF_FALSE_GOTO	24
00007 READ_TREE		_ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO	9
00009 STACK_PUSH	reg#1
00010 CALL_FUNCTION	abs(reg#1) -> reg#4
00011 STACK_POP	1
00012 IF_FALSE_GOTO	13
00013 STACK_PUSH	reg#4
00014 STACK_PUSH	1 <FT_DOUBLE>
00015 CALL_FUNCTION	abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP	1
00017 IF_FALSE_GOTO	18
00018 STACK_PUSH	reg#5
00019 CALL_FUNCTION	max(reg#5, reg#4) -> reg#3
00020 STACK_POP	2
00021 IF_FALSE_GOTO	24
00022 DIVIDE		reg#0 / reg#3 -> reg#6
00023 ANY_LT		reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN

We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.

It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
2022-04-18 17:10:31 +01:00
João Valverde cb2f085f14 dfilter: Add max() and min() functions
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.

Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
2022-04-14 13:07:41 +00:00
João Valverde 09696f1762 Try to fix a narrowing warning
"C:\Development\wsbuild64\Wireshark.sln" (default target) (1) ->
"C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj.metaproj" (default target) (18) ->
"C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj" (default target) (108) ->
       (ClCompile target) ->
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267: '+=': conversion from 'size_t' to 'int
       ', possible loss of data [C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj]
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267:         state->location.col_start += sta
       te->location.col_len; [C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj]
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267:
                           ^ (compiling source file C:\Development\wsbuild64\epan\dfilter\scanner.c) [C:\Development\ws
       build64\epan\dfilter\dfilter.vcxproj]
2022-04-11 22:23:13 +01:00
João Valverde 4d9470e7dd dfilter: Add location tracking to scanner and use it to report errors
Add location tracking as a column offset and length from offset
to the scanner. Our input is a single line only so we don't need
to track line offset.

Record that information in the syntax tree. Return the error location
in dfilter_compile(). Use it in dftest to mark the location of the
error in the filter string. Later it would be nice to use the location
in the GUI as well.

$ dftest "ip.proto == aaaaaa and tcp.port == 123"
Filter: ip.proto == aaaaaa and tcp.port == 123
dftest: "aaaaaa" cannot be found among the possible values for ip.proto.
	ip.proto == aaaaaa and tcp.port == 123
	            ^~~~~~
2022-04-10 10:09:51 +01:00
João Valverde da19379eb5 dfilter: Create the syntax node in the scanner and pass that
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
2022-04-10 09:54:03 +01:00
João Valverde 8fb28f5161 dfilter: Minor grammar cleanup
Remove duplication for arithmetic expressions.
2022-04-05 12:04:37 +01:00
João Valverde 20afbd46ec dfilter: Remove existence test syntax tree nodes
After some experimentation I don't think these two existence tests
belong in the grammar, it's an implementation detail and removing it
might avoid some artificial constraints.
2022-04-05 12:04:37 +01:00
João Valverde fb08c4b4a8 dfilter: Replace bitwise sttype with arithmetic
Most of the bitwise codepaths are just duplicating code for
the arithmetic type. Parse bitwise expressions as arithmetic
instead.
2022-04-05 12:04:37 +01:00
João Valverde c98df5eef5 dfilter: Print syntax tree using dftest + format enhancements
Add argument to dfilter_compile_real() to save syntax tree text
representation.

Use it with dftest to print syntax tree.

Misc debug output format improvements.
2022-04-05 12:04:37 +01:00
João Valverde 8bc214b5bb dfilter: Add remaining arithmetic integer ops 2022-03-31 16:49:42 +01:00
João Valverde 2a9cb588aa dfilter: Add binary arithmetic (add/subtract)
Add support for display filter binary addition and subtraction.

The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.

We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.

Fixes #15504.
2022-03-31 11:27:34 +01:00
João Valverde 260942e170 dfilter: Refactor macro tree references
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.

Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.

The reference is synctatically validated at compile time.

There are two main advantages to this implementation (and a couple of
minor ones):

The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).

The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).

Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).

If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.

Fixes #17599.
2022-03-29 12:36:31 +00:00
João Valverde 431cb43b81 dfilter: Remove parenthesis deprecation warning
This usage devalues a mechanism for warning users that deserves more
attention than this minor suggestion.

The warning is inconvenient for intermediate and advanced users.
2022-03-29 12:19:26 +00:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde 7e07f373f5 dfilter: Remove unused function
Clean-up for a68b408a9f.
2022-03-09 11:51:47 +00:00
João Valverde a68b408a9f dfilter: Add RHS bias for literal values
For unparsed values on the RHS of a comparison try
to parse them first as a literal and only then as
a protocol. This is more complicated in code but
should be a use case a lot more common and useful in
practice.

It removes some annoying special cases and applies this
rule consistently to any expression. Consistency is
important otherwise the special cases and exceptions
make the language confusing and difficult to learn.

For values on the LHS the rule remains to first try a
protocol value, then a literal.

Related with issue #17731.
2022-03-05 11:10:54 +00:00
João Valverde c4f9d8abda dfilter: Rename "unparsed" to "literal"
A literal value is a value that cannot be interpreted as a
registered protocol. An unparsed value can be a literal or
an identifier (protocol/field) according to context and the
current disambiguation rules.

Strictly literal here is to be understood to  mean "numeric
literal, including numeric arrays, but not strings or character
constants".
2022-03-05 11:10:54 +00:00
João Valverde 1278e36152 dfilter: Add more debug code 2022-02-27 23:35:57 +00:00
João Valverde 8b23dd3a3c dfilter: Add an "all equal" operator
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.

The symbol chosen for "all_eq" is "===".
2021-12-22 14:32:32 +00:00
João Valverde 647decd509 dfilter: Avoid double strdup to save token value
Store the lval token value instead.
2021-12-01 19:42:51 +00:00
João Valverde 557cee31fc dfilter: Save lexical token value to syntax tree
Use that for error messages, including any using test operators.

This allows to always use the same name as the user. It avoids
cases where the user write "a && b" and the message is "a and b"
is syntactically invalid.

It should also allow us to be more consistent with the use of
double quotes.
2021-12-01 13:34:01 +00:00
João Valverde a6f978b4d3 dfilter: Remove two stnode replacement functions
One is unused and the other is only used with a corner
case. They are probably not necessary otherwise.
2021-11-30 19:48:47 +00:00
João Valverde 943c282009 dfilter: Parse character constants in lexer
Invalid character constants should be handled in the lexical scanner.

Todo: See if some code could be shared to parse double quoted strings.

It also fixes some unintuitive type coercions to string. Character
constants should be treated as characters, or maybe integers, or
maybe even throw an invalid comparison error, but coverting to a
literal string or byte array is surprising and not particularly
useful:
  '\xFF' -> "'\xFF'" (equals)
  '\xFF' -> "FF"     (contains)

Before:

    Filter: http.request.method contains "\x63"

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method contains '\x63'

    Constants:
    00000 PUT_FVALUE	"63" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == "\x63"

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == '\x63'

    Constants:
    00000 PUT_FVALUE	"'\\x63'" <FT_STRING> -> reg#1
    (...)

After:

    Filter: http.request.method contains '\x63'

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == '\x63'

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)
2021-11-24 08:40:20 +00:00
João Valverde 7028646f9e dfilter: Fix invalid character constant error message
This reverts commit d635ff4933.

A charconst cannot be a value string, for that reason it is not
redundant with unparsed.

Maybe character constants should be parsed in the lexical scanner
instead.

Before:
  Filter: ip.proto == '\g'
  dftest: "'\g'" cannot be found among the possible values for ip.proto.

After:
  Filter: ip.proto == '\g'
  dftest: "'\g'" isn't a valid character constant.
2021-11-23 17:35:40 +00:00
João Valverde 75bb51eef9 dfilter: Clean up some debug statements, second try
Add just a console entry for check_test(), in a more compact
form.

Remove logging of the call chain. This was partially replaced by the
printout of the syntax tree.
2021-11-16 11:27:04 +00:00
João Valverde e7ecc9b9e5 dfilter: Clean up error format and exception code
Misc code cleanups. Add some extra stnode functions for increased type
safety. Fix a constness issue with df_lval_value().
2021-11-10 03:18:50 +00:00
João Valverde d635ff4933 dfilter: Remove redundant STTYPE_CHARCONST syntax node
A charconst uses the same semantic rules as unparsed so just
use the latter to avoid redundancies.

We keep the use of TOKEN_CHARCONST as an optimization to avoid
an unnecessary name resolution (lookup for a registered field with
the same name as the charconst).
2021-10-31 20:33:31 +00:00
João Valverde a7c625808c dfilter: Add a helper function to create test stnodes 2021-10-27 09:27:45 +01:00
João Valverde f5fea52982 dfilter: Remove token value from syntax tree
Currently unused. This might still be useful to differentiate
different spelling of the same token in user messages, like
"==" and "eq", but currently we are not storing test tokens
anyway, so just remove it, it makes everything simpler.

If it's ever necessary it can be added back.
2021-10-27 09:27:45 +01:00
João Valverde 3e6cc8ce4a dfilter: Remove unused function definition 2021-10-14 16:21:33 +01:00
João Valverde 07371d4557 dfilter: Split tostr() into debug and pretty print 2021-10-11 21:55:45 +00:00
João Valverde 5dd90e3b30 dfilter: Cache stnode_tostr()
This avoids having to save/free the pointer for each tostr()
invocation (or leak memory).
2021-10-11 21:55:45 +00:00
João Valverde 2c701ddf6f dfilter: Improve grammar to parse ranges
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.

Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
2021-10-08 19:18:56 +01:00
João Valverde db85625af9 dfilter: Rewrite ws_assert_magic() again 2021-10-08 04:01:24 +00:00
João Valverde e4e0b97082 dfilter: Use wslog with ws_assert_magic() 2021-10-06 15:44:48 +00:00
João Valverde 8c5a4f9100 dfilter: Replace node accessor macros with functions
Replace macro magic to improve ease of comprehension and maintenance.
2021-10-06 15:44:48 +00:00
João Valverde a7242733a4 dfilter: Fix ws_assert_magic() macro
We need to use WS_DISABLE_DEBUG, not WS_DEBUG.

Fixes 0e50979b3f.

Rename some lingering assert_magic() references.
2021-10-06 15:44:48 +00:00
João Valverde 4804c1224d dfilter: Use syntax tree node replacement semantics
Instead of using 3 operations (new + free + reassign_to_parent) to transform
the tree use a simpler single replace operation instead.

This also avoids having to manually copy token values.

The set search and replace method is now obsolete.
2021-10-06 10:34:21 +00:00
João Valverde a940318f37 dfilter: Minor grammar fixups
Clean up syntax error code. TEST and SET are never returned by
the tokenizer.

Remove unnecessary range_body() grammar element. Fix a comment.

Move the stnode_token_value() function to its proper place.
2021-10-05 17:56:21 +01:00
João Valverde db18865e55 dfilter: Save token value to syntax tree
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.

Before:
  % dftest '"01:02:03:04"[0:3] == foo'
  Filter: ""01:02:03:04"[0:3] == foo"
  dftest: Range is not supported for entity <0x7f6c84017740> of type STRING

After:
  % dftest '"01:02:03:04"[0:3] == foo'
  Filter: ""01:02:03:04"[0:3] == foo"
  dftest: Range is not supported for entity 01:02:03:04 of type STRING

When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
2021-10-01 16:04:37 +00:00
João Valverde 487e2b6bc3 dfilter: Remove unnecessary log activation check
Use log_write_always_full() instead of ws_log() to avoid a useless
activation check.

Rename stnode_log() to log_stnode() for consistency.
2021-10-01 16:04:37 +00:00
João Valverde b4af7c52a5 dfilter: Add a flags member to the syntax tree node
Use it to record "inside parenthesis".
2021-09-30 17:03:55 +00:00
João Valverde 0e7ba54d98 dfilter: Clean up handling of "deprecated" tokens
Pass the deprecated data struture to the scanner and insert the deprecated
tokens there. This avoids having to keep a dedicated syntax node field
for this.

Pass the deprecated argument in dfwork_t instead of in a separate
argument. This is less cumbersome than adding an extra argument
to every level of the semantic checker.
2021-09-30 17:26:19 +01:00
João Valverde 3ea2a61f2a dfilter: Display syntax tree for debugging
Use wslog to output debug information. Being able to control
it at runtime is a big advantage.

We extend the syntax tree nodes with a method to return a
canonical string representation.

Add a routine to walk the tree and return an textual representation
for debugging purposes.
2021-09-30 16:29:11 +01:00
João Valverde 0e50979b3f Replace g_assert() with ws_assert() 2021-06-19 01:23:31 +00:00