Commit Graph

105 Commits

Author SHA1 Message Date
Martin Mathieson fc013d9bd6 Set unique exit codes for processes 2023-01-11 09:56:26 +00:00
João Valverde 46d58f97cc dfilter: Add a test 2023-01-07 21:21:36 +00:00
João Valverde 4d3f580961 tests: Reorganize dfilter group 2023-01-07 21:21:36 +00:00
João Valverde 1861679e81 dfilter: Optimize some scanner patterns
Cleanup flex code. Optimize some patterns to avoid lookups
for field matches for values that are not legal field names.

Improve warning and add some comments.
2023-01-07 21:15:25 +00:00
João Valverde 522c74b734 dftest: More CLI options and improve output format 2023-01-05 20:26:42 +00:00
João Valverde e990b25ea2 dfilter: Remove semcheck arithmetic commute argument
No one is using this so I'd like to explore other
options first to handle constants in arithmetic
expressions that lack type information.

Reverts 3ddb017a88.
2023-01-03 12:46:13 +00:00
João Valverde f37c7c4062 dfilter: Tweak representation for length-1 byte array
Make dfilter byte representation always use ':' for consistency.

Make 1 byte be represented as "XX:" with the colon suffix to
make it nonambiguous that is is a byte and not other type,
like a protocol.

The difference is can be seen in the following programs. In the
before representation it is not obvious at all that the second
"fc" value is a literal bytes value and not the value of the
protocol "fc", although it can be inferred from the lack of
a READ_TREE instruction. In the After we know that "fc:" must
be bytes and not a protocol.

Note that a leading colon is a syntactical expedient to say
"this value with any type is a literal value and not a protocol
field." A terminating colon is just a part of the dfilter
literal bytes syntax.

Before:

Filter: fc == :fc

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(fc <FT_PROTOCOL>)
   1 FVALUE(fc <FT_PROTOCOL>)

Instructions:
00000 READ_TREE		fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fc <FT_PROTOCOL>

After:

Filter: fc == :fc

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(fc <FT_PROTOCOL>)
   1 FVALUE(fc: <FT_PROTOCOL>)

Instructions:
00000 READ_TREE		fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fc: <FT_PROTOCOL>
2023-01-02 02:54:38 +00:00
João Valverde a17fb20550 dfilter: Remove commute argument from semantic check
Take a more conservative, less flexible, maybe more elegant,
approach to type inference for now.
2022-12-30 18:46:22 +00:00
João Valverde bc74d2e3e4 dftest: Fix command-line argument parsing
Expressions that start with hyphen clash with command-line options.
In that case we need to pass "--" to dftest to stop processing
options.

Fix the test suite to do this. Fixes failures with dftest and
expressions like:

    -2 == tcp.port

Replace the GLib option parser with getopt while at it. The GLib API
is nice but isn't a good fit for this utility and the code appears to
be inconsistent on whether "--" is left in the argv or not.
2022-12-30 18:27:30 +00:00
João Valverde 77ef21f86e dfilter: Replace unparsed lexical type and simplify grammar
Remove unparsed lexical type and replace it with identifier
and constant. This separation is still necessary to differentiate
names (fields and function) from literals that look like names
but it has some advantages to do it at the lexical level.

The main advantage is a much cleaner and simplified grammar,
because we only have a single token type for field names, without
any loss of generality (the same name is valid for fields and
function names for example).

The CONSTANT token type is necessary to be different from literal
to provide errors for function rules.
2022-12-29 18:28:54 +00:00
João Valverde 508a4011ac tests: Rename test group 2022-12-29 18:28:54 +00:00
João Valverde b19bed43d1 dfilter: Allow constants as the first or only argument to min/max
The strategy here is to delay resolving literals to values until
we have looked at the entire argument list.

Also we will try to commute the relation in a comparison if
we do not have a type for the return value of the function,
like any other constant.

Before:

    Filter: max(1,_ws.ftypes.int8) == 1
    dftest: Argument '1' is not valid for max()
    	max(1,_ws.ftypes.int8) == 1
    	    ^

After:

    Filter: max(1,_ws.ftypes.int8) == 1

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FUNCTION(max#2):
         2 FVALUE(1 <FT_INT8>)
         2 FIELD(_ws.ftypes.int8 <FT_INT8>)
       1 FVALUE(1 <FT_INT8>)

    Instructions:
    00000 STACK_PUSH	1 <FT_INT8>
    00001 READ_TREE		_ws.ftypes.int8 <FT_INT8> -> reg#1
    00002 IF_FALSE_GOTO	3
    00003 STACK_PUSH	reg#1
    00004 CALL_FUNCTION	max(reg#1, 1 <FT_INT8>) -> reg#0
    00005 STACK_POP	2
    00006 IF_FALSE_GOTO	8
    00007 ANY_EQ		reg#0 == 1 <FT_INT8>
    00008 RETURN
2022-12-27 02:21:06 +00:00
João Valverde 6399f724d9 dfilter: Fix crash with min/max literal argument
Filter: max(1,_ws.ftypes.int8) == 1
     ** (dftest:64938) 01:43:25.950180 [DFilter ERROR] epan/dfilter/sttype-field.c:117 -- sttype_field_ftenum(): Magic num is 0x5cf30031, but should be 0xfc2002cf
2022-12-27 01:54:57 +00:00
João Valverde 540b71d738 dfilter: Fix crash with a constant arithmetic expression 2022-12-26 23:55:27 +00:00
João Valverde 3ddb017a88 dfilter: Allow arithmetic expression to commute
Allow an arithmetic expression like 1 + some.field. If we
cannot assign a type to the LHS commute the terms and
try again.

Before:

    Filter: _ws.ftypes.int32 + 1 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 OP_ADD:
         2 FIELD(_ws.ftypes.int32 <FT_INT32>)
         2 FVALUE(1 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		reg#0 + 1 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#1 == 10 <FT_INT32>
    00004 RETURN

    Filter: 1 + _ws.ftypes.int32 == 10
    dftest: Constant arithmetic expression on the LHS is invalid.
    	1 + _ws.ftypes.int32 == 10
    	^

After:

    Filter: _ws.ftypes.int32 + 1 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 OP_ADD:
         2 FIELD(_ws.ftypes.int32 <FT_INT32>)
         2 FVALUE(1 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		reg#0 + 1 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#1 == 10 <FT_INT32>
    00004 RETURN

    Filter: 1 + _ws.ftypes.int32 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 OP_ADD:
         2 FVALUE(1 <FT_INT32>)
         2 FIELD(_ws.ftypes.int32 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		1 <FT_INT32> + reg#0 -> reg#1
    00003 ANY_EQ		reg#1 == 10 <FT_INT32>
    00004 RETURN
2022-12-26 20:50:44 +00:00
João Valverde 079ef9a165 dfilter: Allow comparison relation to commute
Comparison relations should be allowed to commute but they can not
because we need type information to resolve literals to fvalues. For
that reason an expression like "1 == some.field"  is invalid. Solve
that by commuting the relation if the first try did not succeed in
assigning a type to the LHS.

After the second try give up, that means we have a relation with
constants on both sides and that is not semantically valid.

Other relations like "matches" and "contains" are not symmetric and
should not commute anyway.

Before:

    Filter: _ws.ftypes.int32 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.int32 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 10 <FT_INT32>
    00003 RETURN

    Filter: 10 == _ws.ftypes.int32
    dftest: Left side of "==" expression must be a field or function, not 10.
    	10 == _ws.ftypes.int32
    	^~

After:

    Filter: _ws.ftypes.int32 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.int32 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 10 <FT_INT32>
    00003 RETURN

    Filter: 10 == _ws.ftypes.int32

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FVALUE(10 <FT_INT32>)
       1 FIELD(_ws.ftypes.int32 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		10 <FT_INT32> == reg#0
    00003 RETURN
2022-12-26 15:29:50 +00:00
João Valverde 4e1211de90 dfilter: Add support for negation of arithmetic expressions 2022-12-22 23:51:16 +00:00
Peter Wu df478a365d dfilter: treat carriage returns as whitespace
Fixes #18595
2022-11-07 01:00:50 +00:00
João Valverde b83658d8a4 dfilter: Add suport for raw addressing with references
Extends raw adressing syntax to wok with references. The syntax
is
    @field1 == ${@field2}

This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
2022-10-31 21:02:39 +00:00
João Valverde 0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
Miroslav Lichvar d892d28481 NTP: Improve handling of poll and precision fields
The poll and precision fields in timing NTP messages are signed
integers.

Different NTP implementations have different minimum and maximum polling
intervals. Some can be configured even with negative values for
sub-second intervals (e.g. down to -7 for 1/128th of a second).

NTP clocks on modern systems and hardware typically have
a sub-microsecond precision.

Print all poll values. Add the raw precision and change the resolution
of the printed value to nanoseconds.
2022-10-31 14:38:50 +00:00
João Valverde 0662a3f6ac dfilter: Amend a numeric pattern in the scanner
We amend the :<numeric> pattern to not eat the leading
colon. Because the colon can be part of the value (with IPv6 addresses
for example) we want to avoid doing that.

IPv6 addresses are covered by their own rules but this removes the
requirement in the future to handle any special cases and avoids
surprises.

For this reason the colon-prefix syntax is already explicitly defined to
work only for byte arrays and there is currently no universal
syntax for all literal values or even all numbers.

Other numbers can keep using the lexical type "unparsed".

```
run/dftest "_ws.ftypes.uint8 == :fd"
Filter: _ws.ftypes.uint8 == :fd
dftest: ":fd" is not a valid number.
	_ws.ftypes.uint8 == :fd
	                    ^~~

run/dftest "_ws.ftypes.uint8 == fd"
Filter: _ws.ftypes.uint8 == fd
dftest: "fd" is not a valid number.
	_ws.ftypes.uint8 == fd
	                    ^~

run/dftest "_ws.ftypes.uint8 == 0xfd"
Filter: _ws.ftypes.uint8 == 0xfd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.uint8 <FT_UINT8>)
   1 FVALUE(253 <FT_UINT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.uint8 <FT_UINT8> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == 253 <FT_UINT8>
00003 RETURN

run/dftest "_ws.ftypes.bytes == fd"
Filter: _ws.ftypes.bytes == fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN

run/dftest "_ws.ftypes.bytes == :fd"
Filter: _ws.ftypes.bytes == :fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN
```
2022-10-08 09:51:49 +00:00
João Valverde 14f5121c4a dfilter: Remove problematic <...> literal syntax
The <...> syntax for literals, intended to be as generic as
possible, unintentionally introduced an ambiguity with the
relational expression "a < b or a > c".

Literals are values like numbers, bytes, IPv6 addresses or, one
could imagine, UNC paths for example, if an FT_UNC type were to
be added in the future.

We could use a new unique symbol like @...@ but the <...>
syntax is very recent and may not be necessary with ":xxx" so
just remove it.

A byte array can be explicitly declared by prefixing with a colon. It
is not as generic but the main ambiguity that this new syntax attempted
to solve is bytes vs protocol names. We don't want to introduce a new
reserved symbol for now, until other requirements if any are more clear.

Fixes #18418.
2022-10-08 09:51:49 +00:00
Chuck Craft b60240a8a6 spelling: "two pass" -> two-pass 2022-08-22 10:20:29 +00:00
Stig Bjørlykke bfe8187608 test: Add dfilter 'double' tests
Add test cases for scientific notation and not equal.
2022-07-26 10:32:16 +00:00
João Valverde a877f2d5f3 dfilter: Allow existence check for slices
Allow checking if a slice exists. The result is true if the
slice has length greater than zero.

The len() function is implemented as a DFVM instruction instead.
The semantics are the same.
2022-07-04 22:45:14 +00:00
João Valverde b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde e9e6431d7b dfilter: Change boolean string representation
Use "True" or "TRUE" instead of "true" and remove case insensivity.
Same for false. This should serve to differentiate booleans a bit
more from protocol names, which should be using lower-case.
2022-06-25 13:02:34 +01:00
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde de103394fe dfilter: Make regex matches case insensitive by default 2022-06-08 12:17:22 +01:00
João Valverde 51de43cfd2 dfilter: Fix protocol slices with negative indexes
Field infos have a length property that was not stored with the
field value so when using a negative index the end was computed
from the captured length of the frame tvbuff, leading to incorrect
results. The documentation in wireshark-filter(5) describes how
this was supposed to work but as far as I can tell it never worked
properly.

We now store the length and use that (when it is different from -1)
to locate the end of the protocol data in the tvbuff. An extra wrinkle
is that sometimes the length is set after the field value is created.
This is the most common case as the majority of protocols have a
variable length and dissection generally proceeds with a TVB subset from
the current layer (with offset zero) through all remaining layers to the
end of the captured length. For that reason we must use an expedient to allow
changing the protocol length of an existing protocol fvalue, whenever
proto_item_set_len() is called.

Fixes #17772.
2022-05-23 23:04:07 +01:00
João Valverde b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde eb2a9889c3 dfilter: Add abs() function
Add an absolute value function for ftypes.
2022-04-18 17:09:00 +01:00
João Valverde cef02cc3a0 dfilter: Add max()/min() tests and documentation 2022-04-14 13:07:41 +00:00
João Valverde 0dba7456aa tests: Remove leftover debug print 2022-04-13 01:15:11 +01:00
João Valverde 8355e96858 tests: Add test for display filter field reference 2022-04-12 14:03:18 +00:00
João Valverde da19379eb5 dfilter: Create the syntax node in the scanner and pass that
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
2022-04-10 09:54:03 +01:00
João Valverde fb9a176587 dfilter: Allow grouping arithmetical expressions with { }
This removes the limitation of having only two terms in an
arithmetic expression and allows setting the precedence using
curly braces (like any basic calculator).

Our grammar currently does not allow grouping arithmetic expressions
using parenthesis, because boolean expressions and arithmetic
expressions are different and parenthesis are used with the former.
2022-04-08 23:12:04 +01:00
João Valverde 0313cd02bc dfilter: Fix RHS bias for literal values
Fixes a3b76138f0.
2022-04-06 23:46:22 +01:00
João Valverde c30a417528 dflter: Add test 2022-04-06 18:37:23 +01:00
João Valverde 5584aba326 dfilter: Fix slice using range [:j]
Fixes:

$ dftest 'frame[:10] contains 0xff'
dftest: ":10" is not a valid range.
2022-04-06 18:35:10 +01:00
João Valverde a6f37323e6 dfilter: Clean up lexical scanning 2022-04-06 18:11:27 +01:00
João Valverde 6057d1a6e2 dfilter: Add more IPv6 tests 2022-04-06 18:09:12 +01:00
João Valverde 12c8cc32f0 dfilter: Fix parsing of some IPv6 compressed addresses
Fix parsing of some IPv6 addresses and add tests.

Also pass tokens as unparsed unless the user was specfic about
the semantic type. For example the IPv4 address 1.1.1.1 is also a
valid field, but 1.1.1.1/128 is not (because of the slash). However
choose not to enforce the distinction in the lexical scanner and pass
everything as unparsed unless the meaning is explicit in the syntax
with leading dot, colon, or between angle branckets.
2022-04-06 10:10:04 +01:00
João Valverde 7ed5d5036e dfilter: restore support for identifiers using hyphen
Restores support for filters such as "mac-lte", that was broken
in 330d408328.

This means we are not able to support arithmetic expressions with binary
minus without spaces.

$ dftest 'tcp.port == 1-2'
dftest: "1-2" is not a valid number.
2022-04-05 15:38:20 +01:00
João Valverde 330d408328 dfilter: Allow arithmetic expressions without spaces
To allow an arithmetic expressions without spaces, such as "1+2",
we cannot match the expression in other lexical rules using "+". Because
of longest match this becomes the token LITERAL or UNPARSED with semantic value
"1+2". The same goes for all the other arithmetic operators.

So we need to remove [+-*/%] from "word chars" and add very specific
patterns (that won't mistakenly match an arithmetic expression) for
those literal or unparsed tokens we want to support using these characters.
The plus was not a problem but right slash is used for CIDR, minus for
mac address separator, etc.

There are still some corner case. 11-22-33-44-55-66 is a mac
address and not the arithmetic expression with six terms "eleven
minus twenty two minus etc." (if we ever support more than two terms
in the grammar, which we don't currently).

We lift some patterns from the flex manual to match on IPv4 and
IPv6 (ugly) and add MAC address.

Other hypothetical literal lexical values using [+-*/%] are already
supported enclosed in angle brackets but the cases of MAC/IPv4/IPv6 are
are very common and moreover we need to do the utmost to not break backward
compatibily here.

Before:
    $ dftest "_ws.ftypes.int32 == 1+2"
    dftest: "1+2" is not a valid number.

After:
    $ dftest "_ws.ftypes.int32 == 1+2"
    Filter: _ws.ftypes.int32 == 1+2

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		1 <FT_INT32> + 2 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#0 == reg#1
    00004 RETURN
2022-04-04 20:28:55 +00:00
João Valverde f0ca30b60b dfilter: More arithmetic fixes
Fix a failed assertion with constant arithmetic expressions.

Because we do not parse constants on the lexical level it is
more complicated to handle constant expressions with unparsed
values.

We need to handle missing type information gracefully for any
kind of arithmetic expression, not just unary minus.
2022-04-02 18:10:33 +00:00
João Valverde 2a9cb588aa dfilter: Add binary arithmetic (add/subtract)
Add support for display filter binary addition and subtraction.

The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.

We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.

Fixes #15504.
2022-03-31 11:27:34 +01:00