Commit graph

712 commits

Author SHA1 Message Date
João Valverde
44511c318d dfilter: Improve error location for expressions
Try to underline the whole expression instead of the
token.
2022-12-23 18:23:14 +00:00
João Valverde
3938b406fb dfilter: Refactor error location tracking
Remove duplicate location struct by adding a new header.

Pass around a structure instead of a pointer.
2022-12-23 18:23:06 +00:00
João Valverde
4e1211de90 dfilter: Add support for negation of arithmetic expressions 2022-12-22 23:51:16 +00:00
João Valverde
ba1a85d381 dfilter: Improve arithmetic error messages 2022-12-22 10:13:30 +00:00
João Valverde
263bda375c dfilter: Check if type supports unary minus
Fix crash for types that do not support unary minus.

Fixes #18750.
2022-12-21 14:43:39 +00:00
João Valverde
32f88ad22c wmem: Remove strbuf max size parameter
This parameter was introduced as a safeguard for bugs
that generate an unbounded string but its utility for
that purpose is doubtful and the way it is being used
creates problems with invalid truncation of UTF-8
strings.

Rename wmem_strbuf_sized_new() with a better name.
2022-12-03 01:54:52 +00:00
João Valverde
967a3c3df9 Qt: Check field autocomplete for syntactical validity
Currently the autocompletion engine always suggests a protocol
field completion, even in places where it isn't syntactically
valid.

Fix that by compiling the preamble to the token under the cursor
and checking the returned error. If it is DF_ERROR_UNEXPECTED_END
that indicates a field or literal value was expected. Otherwise
a field replacement is not valid in this position.

Fixes #12811.
2022-12-01 22:50:09 +00:00
João Valverde
b116ccd6d5 dfilter: Replace compile booleans arguments with a bit flag 2022-11-30 17:36:17 +00:00
João Valverde
84e75be5c6 dfilter: Add optimization flag
When we are just testing code to see if it compiles performing
optimizations is wasteful. Add an option to disable them.
2022-11-30 17:36:17 +00:00
João Valverde
93814ef740 dfilter: Always set error pointer in case of failure 2022-11-30 15:00:34 +00:00
João Valverde
a0d77e9329 dfilter: Return an error object instead of string
Return an struct containing error information. This simplifies
the interface to more easily provide richer diagnostics in the future.

Add an error code besides a human-readable error string to allow
checking programmatically for errors in a robust manner. Currently
there is only a generic error code, it is expected to increase
in the future.

Move error location information to the struct. Change callers and
implementation to use the new interface.
2022-11-28 15:46:44 +00:00
João Valverde
79c3a77752 Add macros to control lemon diagnostics
Rename flex macros using parenthesis (mostly a style issue):

DIAG_OFF_FLEX -> DIAG_OFF_FLEX()
DIAG_ON_FLEX  -> DIAG_ON_FLEX()

Use the same kind of construct with lemon generated code using
DIAG_OFF_LEMON() and DIAG_ON_LEMON(). Use %include and %code
directives to enforce the desired order with generated code
in the middle in between pragmas.

Fix a clang-specific pragma to use DIAG_OFF_CLANG().

DIAG_OFF(unreachable-code) -> DIAG_OFF_CLANG(unreachable-code).

Apparently GCC is ignoring the -Wunreachable flag, that's why
it did not trigger an unknown pragma warning. From [1}:

  The -Wunreachable-code has been removed, because it was unstable: it
  relied on the optimizer, and so different versions of gcc would warn
  about different code.  The compiler still accepts and ignores the
  command line option so that existing Makefiles are not broken.  In some
  future release the option will be removed entirely. - Ian

[1] https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html
2022-11-20 10:11:27 +00:00
João Valverde
2443df7318 Disable another -Wunreachable lemon warning 2022-11-17 11:21:41 +00:00
Peter Wu
df478a365d dfilter: treat carriage returns as whitespace
Fixes #18595
2022-11-07 01:00:50 +00:00
João Valverde
4c2d0f16d4 dfilter: Improve representation of raw field references
Instead of using the abstract type "<RAW>", which might be confusing,
show FT_BYTES, but display the representation with the "@" operator,
so it's not even more confusing in error messages why a field might
flip-flop types.

Refactor the field tostr() function and some other clean ups.

Before:
```
Filter: _ws.ftypes.string ==${@frame.len}
dftest: _ws.ftypes.string and frame.len <RAW> are not of compatible types.
	_ws.ftypes.string ==${@frame.len}
	                       ^~~~~~~~~
```

After:
```
Filter: _ws.ftypes.string ==${@frame.len}
dftest: _ws.ftypes.string <FT_STRING> and @frame.len <FT_BYTES> are not of compatible types.
	_ws.ftypes.string ==${@frame.len}
	                       ^~~~~~~~~
```
2022-10-31 21:02:39 +00:00
João Valverde
b83658d8a4 dfilter: Add suport for raw addressing with references
Extends raw adressing syntax to wok with references. The syntax
is
    @field1 == ${@field2}

This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
2022-10-31 21:02:39 +00:00
João Valverde
0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
João Valverde
31a0147daa dfilter: Pass a value by reference
The lifetime of the reference is longer than the runtime so avoid
an unecessary fvalue dup.
2022-10-31 21:02:39 +00:00
João Valverde
0583b76204 dfilter: Remove unused data structure 2022-10-31 21:02:39 +00:00
João Valverde
0662a3f6ac dfilter: Amend a numeric pattern in the scanner
We amend the :<numeric> pattern to not eat the leading
colon. Because the colon can be part of the value (with IPv6 addresses
for example) we want to avoid doing that.

IPv6 addresses are covered by their own rules but this removes the
requirement in the future to handle any special cases and avoids
surprises.

For this reason the colon-prefix syntax is already explicitly defined to
work only for byte arrays and there is currently no universal
syntax for all literal values or even all numbers.

Other numbers can keep using the lexical type "unparsed".

```
run/dftest "_ws.ftypes.uint8 == :fd"
Filter: _ws.ftypes.uint8 == :fd
dftest: ":fd" is not a valid number.
	_ws.ftypes.uint8 == :fd
	                    ^~~

run/dftest "_ws.ftypes.uint8 == fd"
Filter: _ws.ftypes.uint8 == fd
dftest: "fd" is not a valid number.
	_ws.ftypes.uint8 == fd
	                    ^~

run/dftest "_ws.ftypes.uint8 == 0xfd"
Filter: _ws.ftypes.uint8 == 0xfd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.uint8 <FT_UINT8>)
   1 FVALUE(253 <FT_UINT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.uint8 <FT_UINT8> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == 253 <FT_UINT8>
00003 RETURN

run/dftest "_ws.ftypes.bytes == fd"
Filter: _ws.ftypes.bytes == fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN

run/dftest "_ws.ftypes.bytes == :fd"
Filter: _ws.ftypes.bytes == :fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN
```
2022-10-08 09:51:49 +00:00
João Valverde
14f5121c4a dfilter: Remove problematic <...> literal syntax
The <...> syntax for literals, intended to be as generic as
possible, unintentionally introduced an ambiguity with the
relational expression "a < b or a > c".

Literals are values like numbers, bytes, IPv6 addresses or, one
could imagine, UNC paths for example, if an FT_UNC type were to
be added in the future.

We could use a new unique symbol like @...@ but the <...>
syntax is very recent and may not be necessary with ":xxx" so
just remove it.

A byte array can be explicitly declared by prefixing with a colon. It
is not as generic but the main ambiguity that this new syntax attempted
to solve is bytes vs protocol names. We don't want to introduce a new
reserved symbol for now, until other requirements if any are more clear.

Fixes #18418.
2022-10-08 09:51:49 +00:00
João Valverde
0816e317cb dfilter: Fix crash with FT_NONE and arithmetic expressions
Do the first ftype-can check in an arithmetic expressions before
evaluating the second term to be sure we do not allow FT_NONE as a
valid LHS ftype.

$ dftest '_ws.ftypes.none + 1 == 2'
Filter: _ws.ftypes.none + 1 == 2
dftest: FT_NONE cannot +.
	_ws.ftypes.none + 1 == 2
	^~~~~~~~~~~~~~~
2022-07-28 16:50:09 +00:00
João Valverde
84f54d54e5 dfilter: Fix a crash using abs()
Passing a literal value to abs() on the LHS segfaults, because it
is incorrectly assumed to be a valid field.

We need to check if we actually have a field. While at it improve
the diagnostic of literals.
2022-07-19 19:11:47 +01:00
Alexis La Goutte
b448b6a591 semcheck: fix -Wmissing-prototypes
semcheck.c:1110:1: warning: no previous prototype for function 'check_arithmetic_entity'
2022-07-15 13:45:52 +00:00
Alexis La Goutte
bd28c19ad6 dvfm: Fix -Wmissing-prototypes
dfvm.c:206:1: warning: no previous prototype for function 'dfvm_value_tostr'
dfvm.c:550:1: warning: no previous prototype for function 'filter_finfo_fvalues'
dfvm.c:645:1: warning: no previous prototype for function 'filter_refs_fvalues'
2022-07-15 13:45:52 +00:00
João Valverde
4c975b770e dfilter: Improve compatibility of integer types
Before:

$ dftest '_ws.ftypes.int64 == _ws.ftypes.int8'
Filter: _ws.ftypes.int64 == _ws.ftypes.int8
dftest: _ws.ftypes.int64 and _ws.ftypes.int8 are not of compatible types.
	_ws.ftypes.int64 == _ws.ftypes.int8
	                    ^~~~~~~~~~~~~~~

After:

$ dftest '_ws.ftypes.int64 == _ws.ftypes.int8'
Filter: _ws.ftypes.int64 == _ws.ftypes.int8

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.int64 <FT_INT64>)
   1 FIELD(_ws.ftypes.int8 <FT_INT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.int64 <FT_INT64> -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_TREE		_ws.ftypes.int8 <FT_INT8> -> reg#1
00003 IF_FALSE_GOTO	5
00004 ANY_EQ		reg#0 === reg#1
00005 RETURN
2022-07-14 20:12:30 +00:00
João Valverde
f68f172454 dfilter: Remove a debug message
Still too noisy even with noisy level.
2022-07-13 16:06:28 +00:00
João Valverde
6c8a8d7960 dfilter: Fix dfvm code string
All/any equal have their own symbols for operators so cannot
be handled in the same switch case.

Other comparisons don't have different symbols for any/all.
2022-07-13 00:37:12 +01:00
João Valverde
5e3a7e9ab8 dfilter: Small optimization for "not all zero" code
Remove extra NOT instruction. Also remove unused ANY_ZERO opcode.
2022-07-05 09:58:43 +01:00
João Valverde
a877f2d5f3 dfilter: Allow existence check for slices
Allow checking if a slice exists. The result is true if the
slice has length greater than zero.

The len() function is implemented as a DFVM instruction instead.
The semantics are the same.
2022-07-04 22:45:14 +00:00
João Valverde
0fc81c21b2 dfilter: Cleanup scanner value setters 2022-07-04 22:15:40 +00:00
João Valverde
8d93f0920a dfilter: Fix some debug strings 2022-07-02 21:21:12 +01:00
João Valverde
eb8acd088e dfilter: Rename dfvm opcodes with a namespace prefix 2022-07-02 11:46:45 +01:00
João Valverde
fc5c81328e dfilter: Rename test syntax tree node
Test node also includes arithmetic operations so rename it
to a generic "operator" node.
2022-07-02 11:39:17 +01:00
João Valverde
b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
Roland Knall
8bdff72625 dfilter: Fix undefined dereference and add null check
A value of ref could be accessed undefined and add additional
checks to ensure, that refs_array actually contains data or return
null immediately
2022-06-27 14:57:01 +00:00
João Valverde
efbe699756 dfilter: Remove STTYPE_RANGE_NODE
STTYPE_RANGE_NODE is just a lexical token, it is
not used withi the syntax tree so remove it.
2022-06-25 16:06:48 +01:00
João Valverde
aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde
8793650707 dftest: Print ftype of protocol fields 2022-06-24 21:10:45 +00:00
João Valverde
354e0d7edf dfilter: Add support for unicode escape sequences
Add support for entering unicode codepoints as \uNNNN or \uNNNNNNNN
for strings and charconsts (following the C standard).
2022-06-21 16:54:16 +01:00
João Valverde
47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde
0615ba6317 ftypes: Make accessor functions type safe 2022-06-20 17:29:57 +00:00
João Valverde
de103394fe dfilter: Make regex matches case insensitive by default 2022-06-08 12:17:22 +01:00
Alexis La Goutte
8ee1eabeee dfvm(dfilter): fix clang analyzer warning (Dead.Store) 2022-05-22 08:40:44 +00:00
João Valverde
bebf7afa37 dfilter: Remove unused DFVM 4th instruction argument 2022-05-13 14:13:18 +01:00
João Valverde
3bb918428e dfilter: Remove stale comment 2022-05-13 12:50:33 +00:00
João Valverde
ac901e5de8 dfilter: Fix maybe-unitialized warning
[1702/2528] Building C object epan/dfilter/CMakeFiles/dfilter.dir/dfvm.c.o
In function ‘drange_contains_layer’,
    inlined from ‘filter_finfo_fvalues’ at /home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:587:21:
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:555:41: warning: ‘upper’ may be used uninitialized [-Wmaybe-uninitialized]
  555 |                 if (num >= lower && num <= upper) {  /* inclusive */
      |                                     ~~~~^~~~~~~~
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c: In function ‘filter_finfo_fvalues’:
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:537:20: note: ‘upper’ was declared here
  537 |         int lower, upper;
      |                    ^~~~~
2022-05-13 13:22:29 +01:00
João Valverde
b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
João Valverde
164f3ce9a2 dfilter: Improve syntax tree display format for sets 2022-05-12 14:06:33 +01:00
Joakim Karlsson
b75b8ca72e dfilter: fix may be used uninitialized in this function [-Wmaybe-uninitialized] 2022-04-27 13:36:43 +02:00