Commit Graph

758 Commits

Author SHA1 Message Date
João Valverde 263bda375c dfilter: Check if type supports unary minus
Fix crash for types that do not support unary minus.

Fixes #18750.
2022-12-21 14:43:39 +00:00
João Valverde 32f88ad22c wmem: Remove strbuf max size parameter
This parameter was introduced as a safeguard for bugs
that generate an unbounded string but its utility for
that purpose is doubtful and the way it is being used
creates problems with invalid truncation of UTF-8
strings.

Rename wmem_strbuf_sized_new() with a better name.
2022-12-03 01:54:52 +00:00
João Valverde 967a3c3df9 Qt: Check field autocomplete for syntactical validity
Currently the autocompletion engine always suggests a protocol
field completion, even in places where it isn't syntactically
valid.

Fix that by compiling the preamble to the token under the cursor
and checking the returned error. If it is DF_ERROR_UNEXPECTED_END
that indicates a field or literal value was expected. Otherwise
a field replacement is not valid in this position.

Fixes #12811.
2022-12-01 22:50:09 +00:00
João Valverde b116ccd6d5 dfilter: Replace compile booleans arguments with a bit flag 2022-11-30 17:36:17 +00:00
João Valverde 84e75be5c6 dfilter: Add optimization flag
When we are just testing code to see if it compiles performing
optimizations is wasteful. Add an option to disable them.
2022-11-30 17:36:17 +00:00
João Valverde 93814ef740 dfilter: Always set error pointer in case of failure 2022-11-30 15:00:34 +00:00
João Valverde a0d77e9329 dfilter: Return an error object instead of string
Return an struct containing error information. This simplifies
the interface to more easily provide richer diagnostics in the future.

Add an error code besides a human-readable error string to allow
checking programmatically for errors in a robust manner. Currently
there is only a generic error code, it is expected to increase
in the future.

Move error location information to the struct. Change callers and
implementation to use the new interface.
2022-11-28 15:46:44 +00:00
João Valverde 79c3a77752 Add macros to control lemon diagnostics
Rename flex macros using parenthesis (mostly a style issue):

DIAG_OFF_FLEX -> DIAG_OFF_FLEX()
DIAG_ON_FLEX  -> DIAG_ON_FLEX()

Use the same kind of construct with lemon generated code using
DIAG_OFF_LEMON() and DIAG_ON_LEMON(). Use %include and %code
directives to enforce the desired order with generated code
in the middle in between pragmas.

Fix a clang-specific pragma to use DIAG_OFF_CLANG().

DIAG_OFF(unreachable-code) -> DIAG_OFF_CLANG(unreachable-code).

Apparently GCC is ignoring the -Wunreachable flag, that's why
it did not trigger an unknown pragma warning. From [1}:

  The -Wunreachable-code has been removed, because it was unstable: it
  relied on the optimizer, and so different versions of gcc would warn
  about different code.  The compiler still accepts and ignores the
  command line option so that existing Makefiles are not broken.  In some
  future release the option will be removed entirely. - Ian

[1] https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html
2022-11-20 10:11:27 +00:00
João Valverde 2443df7318 Disable another -Wunreachable lemon warning 2022-11-17 11:21:41 +00:00
Peter Wu df478a365d dfilter: treat carriage returns as whitespace
Fixes #18595
2022-11-07 01:00:50 +00:00
João Valverde 4c2d0f16d4 dfilter: Improve representation of raw field references
Instead of using the abstract type "<RAW>", which might be confusing,
show FT_BYTES, but display the representation with the "@" operator,
so it's not even more confusing in error messages why a field might
flip-flop types.

Refactor the field tostr() function and some other clean ups.

Before:
```
Filter: _ws.ftypes.string ==${@frame.len}
dftest: _ws.ftypes.string and frame.len <RAW> are not of compatible types.
	_ws.ftypes.string ==${@frame.len}
	                       ^~~~~~~~~
```

After:
```
Filter: _ws.ftypes.string ==${@frame.len}
dftest: _ws.ftypes.string <FT_STRING> and @frame.len <FT_BYTES> are not of compatible types.
	_ws.ftypes.string ==${@frame.len}
	                       ^~~~~~~~~
```
2022-10-31 21:02:39 +00:00
João Valverde b83658d8a4 dfilter: Add suport for raw addressing with references
Extends raw adressing syntax to wok with references. The syntax
is
    @field1 == ${@field2}

This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
2022-10-31 21:02:39 +00:00
João Valverde 0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
João Valverde 31a0147daa dfilter: Pass a value by reference
The lifetime of the reference is longer than the runtime so avoid
an unecessary fvalue dup.
2022-10-31 21:02:39 +00:00
João Valverde 0583b76204 dfilter: Remove unused data structure 2022-10-31 21:02:39 +00:00
João Valverde 0662a3f6ac dfilter: Amend a numeric pattern in the scanner
We amend the :<numeric> pattern to not eat the leading
colon. Because the colon can be part of the value (with IPv6 addresses
for example) we want to avoid doing that.

IPv6 addresses are covered by their own rules but this removes the
requirement in the future to handle any special cases and avoids
surprises.

For this reason the colon-prefix syntax is already explicitly defined to
work only for byte arrays and there is currently no universal
syntax for all literal values or even all numbers.

Other numbers can keep using the lexical type "unparsed".

```
run/dftest "_ws.ftypes.uint8 == :fd"
Filter: _ws.ftypes.uint8 == :fd
dftest: ":fd" is not a valid number.
	_ws.ftypes.uint8 == :fd
	                    ^~~

run/dftest "_ws.ftypes.uint8 == fd"
Filter: _ws.ftypes.uint8 == fd
dftest: "fd" is not a valid number.
	_ws.ftypes.uint8 == fd
	                    ^~

run/dftest "_ws.ftypes.uint8 == 0xfd"
Filter: _ws.ftypes.uint8 == 0xfd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.uint8 <FT_UINT8>)
   1 FVALUE(253 <FT_UINT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.uint8 <FT_UINT8> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == 253 <FT_UINT8>
00003 RETURN

run/dftest "_ws.ftypes.bytes == fd"
Filter: _ws.ftypes.bytes == fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN

run/dftest "_ws.ftypes.bytes == :fd"
Filter: _ws.ftypes.bytes == :fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN
```
2022-10-08 09:51:49 +00:00
João Valverde 14f5121c4a dfilter: Remove problematic <...> literal syntax
The <...> syntax for literals, intended to be as generic as
possible, unintentionally introduced an ambiguity with the
relational expression "a < b or a > c".

Literals are values like numbers, bytes, IPv6 addresses or, one
could imagine, UNC paths for example, if an FT_UNC type were to
be added in the future.

We could use a new unique symbol like @...@ but the <...>
syntax is very recent and may not be necessary with ":xxx" so
just remove it.

A byte array can be explicitly declared by prefixing with a colon. It
is not as generic but the main ambiguity that this new syntax attempted
to solve is bytes vs protocol names. We don't want to introduce a new
reserved symbol for now, until other requirements if any are more clear.

Fixes #18418.
2022-10-08 09:51:49 +00:00
João Valverde 0816e317cb dfilter: Fix crash with FT_NONE and arithmetic expressions
Do the first ftype-can check in an arithmetic expressions before
evaluating the second term to be sure we do not allow FT_NONE as a
valid LHS ftype.

$ dftest '_ws.ftypes.none + 1 == 2'
Filter: _ws.ftypes.none + 1 == 2
dftest: FT_NONE cannot +.
	_ws.ftypes.none + 1 == 2
	^~~~~~~~~~~~~~~
2022-07-28 16:50:09 +00:00
João Valverde 84f54d54e5 dfilter: Fix a crash using abs()
Passing a literal value to abs() on the LHS segfaults, because it
is incorrectly assumed to be a valid field.

We need to check if we actually have a field. While at it improve
the diagnostic of literals.
2022-07-19 19:11:47 +01:00
Alexis La Goutte b448b6a591 semcheck: fix -Wmissing-prototypes
semcheck.c:1110:1: warning: no previous prototype for function 'check_arithmetic_entity'
2022-07-15 13:45:52 +00:00
Alexis La Goutte bd28c19ad6 dvfm: Fix -Wmissing-prototypes
dfvm.c:206:1: warning: no previous prototype for function 'dfvm_value_tostr'
dfvm.c:550:1: warning: no previous prototype for function 'filter_finfo_fvalues'
dfvm.c:645:1: warning: no previous prototype for function 'filter_refs_fvalues'
2022-07-15 13:45:52 +00:00
João Valverde 4c975b770e dfilter: Improve compatibility of integer types
Before:

$ dftest '_ws.ftypes.int64 == _ws.ftypes.int8'
Filter: _ws.ftypes.int64 == _ws.ftypes.int8
dftest: _ws.ftypes.int64 and _ws.ftypes.int8 are not of compatible types.
	_ws.ftypes.int64 == _ws.ftypes.int8
	                    ^~~~~~~~~~~~~~~

After:

$ dftest '_ws.ftypes.int64 == _ws.ftypes.int8'
Filter: _ws.ftypes.int64 == _ws.ftypes.int8

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.int64 <FT_INT64>)
   1 FIELD(_ws.ftypes.int8 <FT_INT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.int64 <FT_INT64> -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_TREE		_ws.ftypes.int8 <FT_INT8> -> reg#1
00003 IF_FALSE_GOTO	5
00004 ANY_EQ		reg#0 === reg#1
00005 RETURN
2022-07-14 20:12:30 +00:00
João Valverde f68f172454 dfilter: Remove a debug message
Still too noisy even with noisy level.
2022-07-13 16:06:28 +00:00
João Valverde 6c8a8d7960 dfilter: Fix dfvm code string
All/any equal have their own symbols for operators so cannot
be handled in the same switch case.

Other comparisons don't have different symbols for any/all.
2022-07-13 00:37:12 +01:00
João Valverde 5e3a7e9ab8 dfilter: Small optimization for "not all zero" code
Remove extra NOT instruction. Also remove unused ANY_ZERO opcode.
2022-07-05 09:58:43 +01:00
João Valverde a877f2d5f3 dfilter: Allow existence check for slices
Allow checking if a slice exists. The result is true if the
slice has length greater than zero.

The len() function is implemented as a DFVM instruction instead.
The semantics are the same.
2022-07-04 22:45:14 +00:00
João Valverde 0fc81c21b2 dfilter: Cleanup scanner value setters 2022-07-04 22:15:40 +00:00
João Valverde 8d93f0920a dfilter: Fix some debug strings 2022-07-02 21:21:12 +01:00
João Valverde eb8acd088e dfilter: Rename dfvm opcodes with a namespace prefix 2022-07-02 11:46:45 +01:00
João Valverde fc5c81328e dfilter: Rename test syntax tree node
Test node also includes arithmetic operations so rename it
to a generic "operator" node.
2022-07-02 11:39:17 +01:00
João Valverde b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
Roland Knall 8bdff72625 dfilter: Fix undefined dereference and add null check
A value of ref could be accessed undefined and add additional
checks to ensure, that refs_array actually contains data or return
null immediately
2022-06-27 14:57:01 +00:00
João Valverde efbe699756 dfilter: Remove STTYPE_RANGE_NODE
STTYPE_RANGE_NODE is just a lexical token, it is
not used withi the syntax tree so remove it.
2022-06-25 16:06:48 +01:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde 8793650707 dftest: Print ftype of protocol fields 2022-06-24 21:10:45 +00:00
João Valverde 354e0d7edf dfilter: Add support for unicode escape sequences
Add support for entering unicode codepoints as \uNNNN or \uNNNNNNNN
for strings and charconsts (following the C standard).
2022-06-21 16:54:16 +01:00
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde 0615ba6317 ftypes: Make accessor functions type safe 2022-06-20 17:29:57 +00:00
João Valverde de103394fe dfilter: Make regex matches case insensitive by default 2022-06-08 12:17:22 +01:00
Alexis La Goutte 8ee1eabeee dfvm(dfilter): fix clang analyzer warning (Dead.Store) 2022-05-22 08:40:44 +00:00
João Valverde bebf7afa37 dfilter: Remove unused DFVM 4th instruction argument 2022-05-13 14:13:18 +01:00
João Valverde 3bb918428e dfilter: Remove stale comment 2022-05-13 12:50:33 +00:00
João Valverde ac901e5de8 dfilter: Fix maybe-unitialized warning
[1702/2528] Building C object epan/dfilter/CMakeFiles/dfilter.dir/dfvm.c.o
In function ‘drange_contains_layer’,
    inlined from ‘filter_finfo_fvalues’ at /home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:587:21:
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:555:41: warning: ‘upper’ may be used uninitialized [-Wmaybe-uninitialized]
  555 |                 if (num >= lower && num <= upper) {  /* inclusive */
      |                                     ~~~~^~~~~~~~
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c: In function ‘filter_finfo_fvalues’:
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:537:20: note: ‘upper’ was declared here
  537 |         int lower, upper;
      |                    ^~~~~
2022-05-13 13:22:29 +01:00
João Valverde b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
João Valverde 164f3ce9a2 dfilter: Improve syntax tree display format for sets 2022-05-12 14:06:33 +01:00
Joakim Karlsson b75b8ca72e dfilter: fix may be used uninitialized in this function [-Wmaybe-uninitialized] 2022-04-27 13:36:43 +02:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde c0170dad42 dfilter: Rename "range" to "slice"
The word range is used for different things with different
meanings and that is confusing. Avoid using "range" in code to
mean "slice".

A range is one or more intervals with a lower and upper bound.

A slice is a range applied to a bytes field.

Replace range with slice wherever appropriate. This usage of
"slice" instead of range is generally correct and consistent in
the documentation.
2022-04-26 16:50:59 +00:00
Gerald Combs a73fd872ad dfilter: Add a null check.
Try to fix

*** CID 1504179:  Null pointer dereferences  (FORWARD_NULL)
/builds/wireshark/wireshark/epan/dfilter/dfvm.c: 327 in dfvm_dump_str()
321     				stack_print = dump_str_stack_push(stack_print, arg1_str);
322     				break;
323
324     			case STACK_POP:
325     				wmem_strbuf_append_printf(buf, "%05d STACK_POP\t%s\n", id, arg1_str);
326     				for (i = 0; i < arg1->value.numeric; i ++) {
>>>     CID 1504179:  Null pointer dereferences  (FORWARD_NULL)
>>>     Passing null pointer "stack_print" to "dump_str_stack_pop", which dereferences it.
327     					stack_print = dump_str_stack_pop(stack_print);
328     				}
329     				break;
330
331     			case MK_RANGE:
332     				wmem_strbuf_append_printf(buf, "%05d MK_RANGE\t\t%s[%s] -> %s\n",
2022-04-21 17:10:44 +00:00
João Valverde fab32ea0cb dfilter: Allow arithmetic expressions as function arguments
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):

Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01

Syntax tree:
 0 TEST_LT:
   1 OP_DIVIDE:
     2 FUNCTION(abs#1):
       3 OP_SUBTRACT:
         4 FIELD(_ws.ftypes.double)
         4 FVALUE(1 <FT_DOUBLE>)
     2 FUNCTION(max#2):
       3 FUNCTION(abs#1):
         4 FIELD(_ws.ftypes.double)
       3 FUNCTION(abs#1):
         4 FVALUE(1 <FT_DOUBLE>)
   1 FVALUE(0.01 <FT_DOUBLE>)

Instructions:
00000 READ_TREE		_ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO	3
00002 SUBRACT		reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH	reg#2
00004 CALL_FUNCTION	abs(reg#2) -> reg#0
00005 STACK_POP	1
00006 IF_FALSE_GOTO	24
00007 READ_TREE		_ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO	9
00009 STACK_PUSH	reg#1
00010 CALL_FUNCTION	abs(reg#1) -> reg#4
00011 STACK_POP	1
00012 IF_FALSE_GOTO	13
00013 STACK_PUSH	reg#4
00014 STACK_PUSH	1 <FT_DOUBLE>
00015 CALL_FUNCTION	abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP	1
00017 IF_FALSE_GOTO	18
00018 STACK_PUSH	reg#5
00019 CALL_FUNCTION	max(reg#5, reg#4) -> reg#3
00020 STACK_POP	2
00021 IF_FALSE_GOTO	24
00022 DIVIDE		reg#0 / reg#3 -> reg#6
00023 ANY_LT		reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN

We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.

It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
2022-04-18 17:10:31 +01:00
João Valverde eb2a9889c3 dfilter: Add abs() function
Add an absolute value function for ftypes.
2022-04-18 17:09:00 +01:00
Alexis La Goutte 83959f77e3 dfvm: Fix Dead Store found by Clang Analyzer 2022-04-16 18:15:45 +00:00
João Valverde af878388fe dfilter: Fix scanning of strings
The code was ignoring a SCAN_FAILED return value.
2022-04-15 22:51:15 +01:00
João Valverde 827d143e6e dfilter: Allow function arguments to be non-existent.
Instead of not calling the function if an argument is non-existent
(read tree fails), call the function and let the function handle
the condition.
2022-04-14 13:07:41 +00:00
João Valverde cb2f085f14 dfilter: Add max() and min() functions
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.

Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
2022-04-14 13:07:41 +00:00
João Valverde 8746eea297 dfilter: Try to resolve field reference instead of using a heuristic
Instead of using a heuristic to decide whether the form ${...} is
a macro or not, try to resolve the name to a registered protocol
field and use that instead.

This increases somewhat the surface for clobbering existing macro
names with new field registrations but we'll cross that bridge when
we get to it.

Rejecting protocol field types reduces this probability again but it
may not be intuitive to the user trying to mistakenly use a reference
to a protocol why it is parsed as a macro. The reasons for rejecting
FT_PROTOCOL types as not interesting field references are not
very strong but it seems reasonable.

$ dftest 'frame.number != ${frame.number}'
Filter: frame.number != ${frame.number}

Instructions:
00000 READ_TREE		frame.number -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_REFERENCE	${frame.number} -> reg#1
00003 IF_FALSE_GOTO	5
00004 ALL_NE		reg#0 != reg#1
00005 RETURN

$ dftest 'frame != ${frame}'
dftest: macro 'frame' does not exist
2022-04-12 14:03:18 +00:00
João Valverde 09696f1762 Try to fix a narrowing warning
"C:\Development\wsbuild64\Wireshark.sln" (default target) (1) ->
"C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj.metaproj" (default target) (18) ->
"C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj" (default target) (108) ->
       (ClCompile target) ->
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267: '+=': conversion from 'size_t' to 'int
       ', possible loss of data [C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj]
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267:         state->location.col_start += sta
       te->location.col_len; [C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj]
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267:
                           ^ (compiling source file C:\Development\wsbuild64\epan\dfilter\scanner.c) [C:\Development\ws
       build64\epan\dfilter\dfilter.vcxproj]
2022-04-11 22:23:13 +01:00
João Valverde 2f02cd6e19 dfilter: Handle missing error location more gracefully
If we don't have an offset, don't print anything with underline.

Also it can underline filters using macros correctly now.

$ tshark -Y 'ip and ${private_ipv4:ip.sr}' -r /dev/null
tshark: Left side of "==" expression must be a field or function, not "ip.sr".
    ip and ip.sr == 192.168.0.0/16 or ip.sr == 172.16.0.0/12 or ip.sr == 10.0.0.0/8
           ^~~~~
2022-04-11 21:03:06 +00:00
João Valverde 24443fa33a tshark: Add underline to dfilter errors
$ tshark -Y 'frame.number == 123foobar and ip' -r /dev/null
tshark: "123foobar" is not a valid number.
    frame.number == 123foobar and ip
                    ^~~~~~~~~
2022-04-11 19:25:37 +00:00
João Valverde 4d9470e7dd dfilter: Add location tracking to scanner and use it to report errors
Add location tracking as a column offset and length from offset
to the scanner. Our input is a single line only so we don't need
to track line offset.

Record that information in the syntax tree. Return the error location
in dfilter_compile(). Use it in dftest to mark the location of the
error in the filter string. Later it would be nice to use the location
in the GUI as well.

$ dftest "ip.proto == aaaaaa and tcp.port == 123"
Filter: ip.proto == aaaaaa and tcp.port == 123
dftest: "aaaaaa" cannot be found among the possible values for ip.proto.
	ip.proto == aaaaaa and tcp.port == 123
	            ^~~~~~
2022-04-10 10:09:51 +01:00
João Valverde da19379eb5 dfilter: Create the syntax node in the scanner and pass that
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
2022-04-10 09:54:03 +01:00
João Valverde fb9a176587 dfilter: Allow grouping arithmetical expressions with { }
This removes the limitation of having only two terms in an
arithmetic expression and allows setting the precedence using
curly braces (like any basic calculator).

Our grammar currently does not allow grouping arithmetic expressions
using parenthesis, because boolean expressions and arithmetic
expressions are different and parenthesis are used with the former.
2022-04-08 23:12:04 +01:00
João Valverde cc5726b63f dfilter: Remove leading colon special meaning
Instead of saying a leading colon will make any token a literal
value, say it is part of the syntax of bytes arrays. This is
useful to write bytes without a separator, and other potentially
ambiguous formats.

The restriction in meaning to bytes and simple numeric values
should make the rules for handling a leading colon (specifically
ommiting it or not) saner without much loss of functionality.
2022-04-07 00:16:07 +01:00
João Valverde 0313cd02bc dfilter: Fix RHS bias for literal values
Fixes a3b76138f0.
2022-04-06 23:46:22 +01:00
João Valverde 7429832db4 Fix a log message 2022-04-06 23:42:04 +01:00
João Valverde 5584aba326 dfilter: Fix slice using range [:j]
Fixes:

$ dftest 'frame[:10] contains 0xff'
dftest: ":10" is not a valid range.
2022-04-06 18:35:10 +01:00
João Valverde a6f37323e6 dfilter: Clean up lexical scanning 2022-04-06 18:11:27 +01:00
João Valverde 8108e67de7 dfilter: Fix memory leak with leading colon
When retrying fvalue_from_literal() we were leaking the error
message string.

Refactor the code to avoid the retry. This assumes the only
valid use of a leading ':' with a literal is for an IPv6 address.

Bytes with leading ':' are supported but the colon is skipped,
so the parser doesn't see it.

Fixes df0fc8b517.
2022-04-06 18:09:12 +01:00
João Valverde 12c8cc32f0 dfilter: Fix parsing of some IPv6 compressed addresses
Fix parsing of some IPv6 addresses and add tests.

Also pass tokens as unparsed unless the user was specfic about
the semantic type. For example the IPv4 address 1.1.1.1 is also a
valid field, but 1.1.1.1/128 is not (because of the slash). However
choose not to enforce the distinction in the lexical scanner and pass
everything as unparsed unless the meaning is explicit in the syntax
with leading dot, colon, or between angle branckets.
2022-04-06 10:10:04 +01:00
João Valverde 7ed5d5036e dfilter: restore support for identifiers using hyphen
Restores support for filters such as "mac-lte", that was broken
in 330d408328.

This means we are not able to support arithmetic expressions with binary
minus without spaces.

$ dftest 'tcp.port == 1-2'
dftest: "1-2" is not a valid number.
2022-04-05 15:38:20 +01:00
João Valverde 8fb28f5161 dfilter: Minor grammar cleanup
Remove duplication for arithmetic expressions.
2022-04-05 12:04:37 +01:00
João Valverde 20afbd46ec dfilter: Remove existence test syntax tree nodes
After some experimentation I don't think these two existence tests
belong in the grammar, it's an implementation detail and removing it
might avoid some artificial constraints.
2022-04-05 12:04:37 +01:00
João Valverde fb08c4b4a8 dfilter: Replace bitwise sttype with arithmetic
Most of the bitwise codepaths are just duplicating code for
the arithmetic type. Parse bitwise expressions as arithmetic
instead.
2022-04-05 12:04:37 +01:00
João Valverde c98df5eef5 dfilter: Print syntax tree using dftest + format enhancements
Add argument to dfilter_compile_real() to save syntax tree text
representation.

Use it with dftest to print syntax tree.

Misc debug output format improvements.
2022-04-05 12:04:37 +01:00
João Valverde d91734ab6a dfilter: Fix range registers in DFVM dump 2022-04-05 12:04:37 +01:00
João Valverde 330d408328 dfilter: Allow arithmetic expressions without spaces
To allow an arithmetic expressions without spaces, such as "1+2",
we cannot match the expression in other lexical rules using "+". Because
of longest match this becomes the token LITERAL or UNPARSED with semantic value
"1+2". The same goes for all the other arithmetic operators.

So we need to remove [+-*/%] from "word chars" and add very specific
patterns (that won't mistakenly match an arithmetic expression) for
those literal or unparsed tokens we want to support using these characters.
The plus was not a problem but right slash is used for CIDR, minus for
mac address separator, etc.

There are still some corner case. 11-22-33-44-55-66 is a mac
address and not the arithmetic expression with six terms "eleven
minus twenty two minus etc." (if we ever support more than two terms
in the grammar, which we don't currently).

We lift some patterns from the flex manual to match on IPv4 and
IPv6 (ugly) and add MAC address.

Other hypothetical literal lexical values using [+-*/%] are already
supported enclosed in angle brackets but the cases of MAC/IPv4/IPv6 are
are very common and moreover we need to do the utmost to not break backward
compatibily here.

Before:
    $ dftest "_ws.ftypes.int32 == 1+2"
    dftest: "1+2" is not a valid number.

After:
    $ dftest "_ws.ftypes.int32 == 1+2"
    Filter: _ws.ftypes.int32 == 1+2

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		1 <FT_INT32> + 2 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#0 == reg#1
    00004 RETURN
2022-04-04 20:28:55 +00:00
João Valverde 34ad6bb478 dfilter: Make logical AND higher precedence than logical OR
In most, if not all, programming languages logical AND has
higher precedence than logical OR. Apply the principle of
least surprise and do the same for Wireshark display
filters.

Before: ip and tcp or udp => ip and (tcp or udp)

    Filter: ip and tcp or udp

    Instructions:
    00000 CHECK_EXISTS	ip
    00001 IF_FALSE_GOTO	5
    00002 CHECK_EXISTS	tcp
    00003 IF_TRUE_GOTO	5
    00004 CHECK_EXISTS	udp
    00005 RETURN

After: ip and tcp or udp => (ip and tcp) or udp

    Filter: ip and tcp or udp

    Instructions:
    00000 CHECK_EXISTS	ip
    00001 IF_FALSE_GOTO	4
    00002 CHECK_EXISTS	tcp
    00003 IF_TRUE_GOTO	5
    00004 CHECK_EXISTS	udp
    00005 RETURN
2022-04-04 19:51:38 +00:00
João Valverde f0ca30b60b dfilter: More arithmetic fixes
Fix a failed assertion with constant arithmetic expressions.

Because we do not parse constants on the lexical level it is
more complicated to handle constant expressions with unparsed
values.

We need to handle missing type information gracefully for any
kind of arithmetic expression, not just unary minus.
2022-04-02 18:10:33 +00:00
João Valverde 67e5e5c3ab dfilter: Fix arithmetic expressions on the LHS
Filter: _ws.ftypes.framenum % 3 == 0

Instructions:
00000 READ_TREE		_ws.ftypes.framenum -> reg#0
00001 IF_FALSE_GOTO	4
00002 MODULO		reg#0 % 3 <FT_FRAMENUM> -> reg#1
00003 ANY_EQ		reg#1 == 0 <FT_FRAMENUM>
00004 RETURN
2022-04-01 14:33:38 +01:00
João Valverde 8bc214b5bb dfilter: Add remaining arithmetic integer ops 2022-03-31 16:49:42 +01:00
João Valverde 2a9cb588aa dfilter: Add binary arithmetic (add/subtract)
Add support for display filter binary addition and subtraction.

The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.

We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.

Fixes #15504.
2022-03-31 11:27:34 +01:00
João Valverde 5cd0e4cc97 dfilter: Fix use after free with references
By the time we are using the reference fvalue the tree may have gone
away and with it the fvalue. We need to duplicate the reference
fvalues and take ownership of the memory.
2022-03-30 14:05:22 +01:00
João Valverde 260942e170 dfilter: Refactor macro tree references
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.

Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.

The reference is synctatically validated at compile time.

There are two main advantages to this implementation (and a couple of
minor ones):

The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).

The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).

Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).

If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.

Fixes #17599.
2022-03-29 12:36:31 +00:00
João Valverde 431cb43b81 dfilter: Remove parenthesis deprecation warning
This usage devalues a mechanism for warning users that deserves more
attention than this minor suggestion.

The warning is inconvenient for intermediate and advanced users.
2022-03-29 12:19:26 +00:00
João Valverde d2907d91c0 dfilter: Add more logging for bytecode 2022-03-28 17:59:07 +01:00
João Valverde 9ee9b40b64 dfilter: Store expanded text 2022-03-28 17:22:01 +01:00
João Valverde a1299d63d9 dfilter: Lower level of two debug messages 2022-03-28 17:20:00 +01:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde a3b76138f0 dfilter: Fix memory leak
Filter: tcp.srcport == udp.port

Instructions:
00000 READ_TREE		tcp.srcport -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_TREE		udp.port -> reg#1
00003 IF_FALSE_GOTO	5
00004 ANY_EQ		reg#0 == reg#1
00005 RETURN

=================================================================
==180444==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 34 byte(s) in 1 object(s) allocated from:
    #0 0x55f21e4a9ff9  (/home/jpv/projects/wireshark/wireshark/build-asan/run/dftest+0xcdff9)
    #1 0x7f95ea661338  (/usr/lib/libc.so.6+0x82338)

SUMMARY: AddressSanitizer: 34 byte(s) leaked in 1 allocation(s).

Fixes a68b408a9f.
2022-03-25 18:38:11 +00:00
João Valverde 2fc8c0e36b dfilter: Handle a bitwise expr on the RHS 2022-03-23 11:04:41 +00:00
João Valverde 0335ebdc3a dfilter: ftype_is_true -> ftype_is_zero 2022-03-23 11:04:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde 631cf34f0c dfilter: Use a function pointer array to free registers 2022-03-21 18:43:36 +00:00
João Valverde 6a0129a0e3 dfilter: Fix EditorConfig settings 2022-03-21 17:49:12 +00:00
João Valverde 54d8627c9a dfilter: Add more comments to optimization pass 2022-03-21 17:36:41 +00:00
João Valverde d60f2580ba dfilter: Pass around constants in instructions
The DFVM instructions arguments are generic boxed types but instead
of using FVALUE and PCRE types the code passes aroung REGISTER types
instead. Change that to pass constants in the instruction.
2022-03-21 17:09:56 +00:00
João Valverde 94d909103e dfilter: Remove DFVM constant initialization 2022-03-21 17:09:43 +00:00
João Valverde ae17e733ac dfilter: Use more DFVM values in gencode 2022-03-21 17:09:29 +00:00
João Valverde 769f1f10de dfilter: Add DFVM value constructor 2022-03-21 17:09:19 +00:00
João Valverde 1b574e7466 dfilter: Cleanup dfvm_apply() 2022-03-21 12:38:09 +00:00