Commit Graph

51 Commits

Author SHA1 Message Date
João Valverde 77ef21f86e dfilter: Replace unparsed lexical type and simplify grammar
Remove unparsed lexical type and replace it with identifier
and constant. This separation is still necessary to differentiate
names (fields and function) from literals that look like names
but it has some advantages to do it at the lexical level.

The main advantage is a much cleaner and simplified grammar,
because we only have a single token type for field names, without
any loss of generality (the same name is valid for fields and
function names for example).

The CONSTANT token type is necessary to be different from literal
to provide errors for function rules.
2022-12-29 18:28:54 +00:00
João Valverde 508a4011ac tests: Rename test group 2022-12-29 18:28:54 +00:00
João Valverde 540b71d738 dfilter: Fix crash with a constant arithmetic expression 2022-12-26 23:55:27 +00:00
João Valverde 3ddb017a88 dfilter: Allow arithmetic expression to commute
Allow an arithmetic expression like 1 + some.field. If we
cannot assign a type to the LHS commute the terms and
try again.

Before:

    Filter: _ws.ftypes.int32 + 1 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 OP_ADD:
         2 FIELD(_ws.ftypes.int32 <FT_INT32>)
         2 FVALUE(1 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		reg#0 + 1 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#1 == 10 <FT_INT32>
    00004 RETURN

    Filter: 1 + _ws.ftypes.int32 == 10
    dftest: Constant arithmetic expression on the LHS is invalid.
    	1 + _ws.ftypes.int32 == 10
    	^

After:

    Filter: _ws.ftypes.int32 + 1 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 OP_ADD:
         2 FIELD(_ws.ftypes.int32 <FT_INT32>)
         2 FVALUE(1 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		reg#0 + 1 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#1 == 10 <FT_INT32>
    00004 RETURN

    Filter: 1 + _ws.ftypes.int32 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 OP_ADD:
         2 FVALUE(1 <FT_INT32>)
         2 FIELD(_ws.ftypes.int32 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		1 <FT_INT32> + reg#0 -> reg#1
    00003 ANY_EQ		reg#1 == 10 <FT_INT32>
    00004 RETURN
2022-12-26 20:50:44 +00:00
João Valverde 079ef9a165 dfilter: Allow comparison relation to commute
Comparison relations should be allowed to commute but they can not
because we need type information to resolve literals to fvalues. For
that reason an expression like "1 == some.field"  is invalid. Solve
that by commuting the relation if the first try did not succeed in
assigning a type to the LHS.

After the second try give up, that means we have a relation with
constants on both sides and that is not semantically valid.

Other relations like "matches" and "contains" are not symmetric and
should not commute anyway.

Before:

    Filter: _ws.ftypes.int32 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.int32 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 10 <FT_INT32>
    00003 RETURN

    Filter: 10 == _ws.ftypes.int32
    dftest: Left side of "==" expression must be a field or function, not 10.
    	10 == _ws.ftypes.int32
    	^~

After:

    Filter: _ws.ftypes.int32 == 10

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.int32 <FT_INT32>)
       1 FVALUE(10 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 10 <FT_INT32>
    00003 RETURN

    Filter: 10 == _ws.ftypes.int32

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FVALUE(10 <FT_INT32>)
       1 FIELD(_ws.ftypes.int32 <FT_INT32>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 <FT_INT32> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		10 <FT_INT32> == reg#0
    00003 RETURN
2022-12-26 15:29:50 +00:00
João Valverde 4e1211de90 dfilter: Add support for negation of arithmetic expressions 2022-12-22 23:51:16 +00:00
Peter Wu df478a365d dfilter: treat carriage returns as whitespace
Fixes #18595
2022-11-07 01:00:50 +00:00
João Valverde b83658d8a4 dfilter: Add suport for raw addressing with references
Extends raw adressing syntax to wok with references. The syntax
is
    @field1 == ${@field2}

This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
2022-10-31 21:02:39 +00:00
João Valverde 0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
João Valverde 0662a3f6ac dfilter: Amend a numeric pattern in the scanner
We amend the :<numeric> pattern to not eat the leading
colon. Because the colon can be part of the value (with IPv6 addresses
for example) we want to avoid doing that.

IPv6 addresses are covered by their own rules but this removes the
requirement in the future to handle any special cases and avoids
surprises.

For this reason the colon-prefix syntax is already explicitly defined to
work only for byte arrays and there is currently no universal
syntax for all literal values or even all numbers.

Other numbers can keep using the lexical type "unparsed".

```
run/dftest "_ws.ftypes.uint8 == :fd"
Filter: _ws.ftypes.uint8 == :fd
dftest: ":fd" is not a valid number.
	_ws.ftypes.uint8 == :fd
	                    ^~~

run/dftest "_ws.ftypes.uint8 == fd"
Filter: _ws.ftypes.uint8 == fd
dftest: "fd" is not a valid number.
	_ws.ftypes.uint8 == fd
	                    ^~

run/dftest "_ws.ftypes.uint8 == 0xfd"
Filter: _ws.ftypes.uint8 == 0xfd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.uint8 <FT_UINT8>)
   1 FVALUE(253 <FT_UINT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.uint8 <FT_UINT8> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == 253 <FT_UINT8>
00003 RETURN

run/dftest "_ws.ftypes.bytes == fd"
Filter: _ws.ftypes.bytes == fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN

run/dftest "_ws.ftypes.bytes == :fd"
Filter: _ws.ftypes.bytes == :fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN
```
2022-10-08 09:51:49 +00:00
João Valverde 14f5121c4a dfilter: Remove problematic <...> literal syntax
The <...> syntax for literals, intended to be as generic as
possible, unintentionally introduced an ambiguity with the
relational expression "a < b or a > c".

Literals are values like numbers, bytes, IPv6 addresses or, one
could imagine, UNC paths for example, if an FT_UNC type were to
be added in the future.

We could use a new unique symbol like @...@ but the <...>
syntax is very recent and may not be necessary with ":xxx" so
just remove it.

A byte array can be explicitly declared by prefixing with a colon. It
is not as generic but the main ambiguity that this new syntax attempted
to solve is bytes vs protocol names. We don't want to introduce a new
reserved symbol for now, until other requirements if any are more clear.

Fixes #18418.
2022-10-08 09:51:49 +00:00
João Valverde b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde e9e6431d7b dfilter: Change boolean string representation
Use "True" or "TRUE" instead of "true" and remove case insensivity.
Same for false. This should serve to differentiate booleans a bit
more from protocol names, which should be using lower-case.
2022-06-25 13:02:34 +01:00
João Valverde de103394fe dfilter: Make regex matches case insensitive by default 2022-06-08 12:17:22 +01:00
João Valverde b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde 8355e96858 tests: Add test for display filter field reference 2022-04-12 14:03:18 +00:00
João Valverde fb9a176587 dfilter: Allow grouping arithmetical expressions with { }
This removes the limitation of having only two terms in an
arithmetic expression and allows setting the precedence using
curly braces (like any basic calculator).

Our grammar currently does not allow grouping arithmetic expressions
using parenthesis, because boolean expressions and arithmetic
expressions are different and parenthesis are used with the former.
2022-04-08 23:12:04 +01:00
João Valverde 0313cd02bc dfilter: Fix RHS bias for literal values
Fixes a3b76138f0.
2022-04-06 23:46:22 +01:00
João Valverde c30a417528 dflter: Add test 2022-04-06 18:37:23 +01:00
João Valverde a6f37323e6 dfilter: Clean up lexical scanning 2022-04-06 18:11:27 +01:00
João Valverde 7ed5d5036e dfilter: restore support for identifiers using hyphen
Restores support for filters such as "mac-lte", that was broken
in 330d408328.

This means we are not able to support arithmetic expressions with binary
minus without spaces.

$ dftest 'tcp.port == 1-2'
dftest: "1-2" is not a valid number.
2022-04-05 15:38:20 +01:00
João Valverde 330d408328 dfilter: Allow arithmetic expressions without spaces
To allow an arithmetic expressions without spaces, such as "1+2",
we cannot match the expression in other lexical rules using "+". Because
of longest match this becomes the token LITERAL or UNPARSED with semantic value
"1+2". The same goes for all the other arithmetic operators.

So we need to remove [+-*/%] from "word chars" and add very specific
patterns (that won't mistakenly match an arithmetic expression) for
those literal or unparsed tokens we want to support using these characters.
The plus was not a problem but right slash is used for CIDR, minus for
mac address separator, etc.

There are still some corner case. 11-22-33-44-55-66 is a mac
address and not the arithmetic expression with six terms "eleven
minus twenty two minus etc." (if we ever support more than two terms
in the grammar, which we don't currently).

We lift some patterns from the flex manual to match on IPv4 and
IPv6 (ugly) and add MAC address.

Other hypothetical literal lexical values using [+-*/%] are already
supported enclosed in angle brackets but the cases of MAC/IPv4/IPv6 are
are very common and moreover we need to do the utmost to not break backward
compatibily here.

Before:
    $ dftest "_ws.ftypes.int32 == 1+2"
    dftest: "1+2" is not a valid number.

After:
    $ dftest "_ws.ftypes.int32 == 1+2"
    Filter: _ws.ftypes.int32 == 1+2

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		1 <FT_INT32> + 2 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#0 == reg#1
    00004 RETURN
2022-04-04 20:28:55 +00:00
João Valverde f0ca30b60b dfilter: More arithmetic fixes
Fix a failed assertion with constant arithmetic expressions.

Because we do not parse constants on the lexical level it is
more complicated to handle constant expressions with unparsed
values.

We need to handle missing type information gracefully for any
kind of arithmetic expression, not just unary minus.
2022-04-02 18:10:33 +00:00
João Valverde 2a9cb588aa dfilter: Add binary arithmetic (add/subtract)
Add support for display filter binary addition and subtraction.

The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.

We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.

Fixes #15504.
2022-03-31 11:27:34 +01:00
João Valverde 431cb43b81 dfilter: Remove parenthesis deprecation warning
This usage devalues a mechanism for warning users that deserves more
attention than this minor suggestion.

The warning is inconvenient for intermediate and advanced users.
2022-03-29 12:19:26 +00:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde 2fc8c0e36b dfilter: Handle a bitwise expr on the RHS 2022-03-23 11:04:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde bd48f947b0 dfilter: Require a field-like value on the LHS
Comparisons require a field-like value on one of the sides,
or both. Change this to require on the LHS or both. There is
realy no reason that I can see to allow the relation to commute,
and it allows removing a lot of unnecessary code and extra tests.
2022-03-05 11:10:54 +00:00
João Valverde 6d520addd1 dfilter: Add special syntax for literals and names
The syntax for protocols and some literals like numbers
and bytes/addresses can be  ambiguous. Some protocols can
be parsed as a literal, for example the protocol "fc"
(Fibre Channel) can be parsed as 0xFC.

If a numeric protocol is registered that will also take
precedence over any literal, according to the current
rules, thereby breaking numerical comparisons to that
number. The same for an hypothetical protocol named "true",
etc.

To allow the user to disambiguate this meaning introduce
new syntax.

Any value prefixed with ':' or enclosed in <,> will be treated
as a literal value only. The value :fc or <fc> will always
mean 0xFC, under any context. Never a protocol whose filter
name is "fc".

Likewise any value prefixed with a dot will always be parsed
as an identifier (protocol or protocol field) in the language.
Never any literal value parsed from the token "fc".

This allows the user to be explicit about the meaning,
and between the two explicit methods plus the ambiguous one
it doesn't completely break any one meaning.

The difference can be seen in the following two programs:

    Filter: frame == fc

    Constants:

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF-FALSE-GOTO	5
    00002 READ_TREE		fc -> reg#1
    00003 IF-FALSE-GOTO	5
    00004 ANY_EQ		reg#0 == reg#1
    00005 RETURN

    --------

    Filter: frame == :fc

    Constants:
    00000 PUT_FVALUE	fc <FT_PROTOCOL> -> reg#1

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF-FALSE-GOTO	3
    00002 ANY_EQ		reg#0 == reg#1
    00003 RETURN

The filter "frame == fc" is the same as "filter == .fc",
according to the current heuristic, except the first form
will try to parse it as a literal if the name does not
correspond to any registered protocol.

By treating a leading dot as a name in the language we
necessarily disallow writing floats with a leading dot. We
will also disallow writing with an ending dot when using
unparsed values. This is a backward incompatibility but has
the happy side effect of making the expression {1...2}
unambiguous.

This could either mean "1 .. .2" or "1. .. 2". If we require
a leading and ending digit then the meaning is clear:
    1.0..0.2 -> 1.0 .. 0.2

Fixes #17731.
2022-03-05 11:10:54 +00:00
João Valverde 49566a5b0c dfilter: Add more tests
Add more tests and fix a copy paste error with the test name.
2022-02-24 21:41:32 +00:00
João Valverde ef31431aeb dfilter: Add a true/false boolean representation
Minor code cleanup.
2022-02-23 23:37:47 +00:00
João Valverde 8b23dd3a3c dfilter: Add an "all equal" operator
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.

The symbol chosen for "all_eq" is "===".
2021-12-22 14:32:32 +00:00
João Valverde 557cee31fc dfilter: Save lexical token value to syntax tree
Use that for error messages, including any using test operators.

This allows to always use the same name as the user. It avoids
cases where the user write "a && b" and the message is "a and b"
is syntactically invalid.

It should also allow us to be more consistent with the use of
double quotes.
2021-12-01 13:34:01 +00:00
João Valverde 352390aa97 dfilter: Need to handle a charconst on the LHS 2021-11-27 17:19:11 +00:00
João Valverde 7028646f9e dfilter: Fix invalid character constant error message
This reverts commit d635ff4933.

A charconst cannot be a value string, for that reason it is not
redundant with unparsed.

Maybe character constants should be parsed in the lexical scanner
instead.

Before:
  Filter: ip.proto == '\g'
  dftest: "'\g'" cannot be found among the possible values for ip.proto.

After:
  Filter: ip.proto == '\g'
  dftest: "'\g'" isn't a valid character constant.
2021-11-23 17:35:40 +00:00
João Valverde 2d45cb0881 dfilter: Improve some error messages 2021-11-06 11:45:21 +00:00
João Valverde d635ff4933 dfilter: Remove redundant STTYPE_CHARCONST syntax node
A charconst uses the same semantic rules as unparsed so just
use the latter to avoid redundancies.

We keep the use of TOKEN_CHARCONST as an optimization to avoid
an unnecessary name resolution (lookup for a registered field with
the same name as the charconst).
2021-10-31 20:33:31 +00:00
João Valverde 15051c0671 dfilter: Fix expressions with bytes as a character constant
Before:
  Filter: frame[1] == 'a'
  dftest: "'a'" is not a valid byte string.

After:
  Filter: frame[1] == 'a'

  Constants:
  00000 PUT_FVALUE	61 <FT_BYTES> -> reg#2

  Instructions:
  00000 READ_TREE		frame -> reg#0
  00001 IF-FALSE-GOTO	4
  00002 MK_RANGE		reg#0[1:1] -> reg#1
  00003 ANY_EQ		reg#1 == reg#2
  00004 RETURN

Fixes 4d2f469212.
2021-10-31 20:14:46 +00:00
João Valverde f78ebe1564 dfilter: Remove deprecated support for whitespace separator in sets 2021-10-31 09:13:18 +00:00
João Valverde c6b68b3ee2 dfilter: Need to check validity of LHS of "matches" expression
Fixes #17690, a crash on a failed assertion.
2021-10-28 16:26:36 +00:00
João Valverde 2183738ef2 dfilter: Add support for comma as set separator
Deprecate the usage of significant whitespace to separate set elements
(or anywhere else for that matter). This will make the implementation
simpler and cleaner and the language more expressive and user-friendly.
2021-10-28 04:11:05 +00:00
João Valverde 0839f05bf7 tests/dfilter: Move deprecated to syntax group 2021-10-27 07:42:23 +00:00
João Valverde 0abe10e040 dfilter: Fix "!=" relation to be free of contradictions
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.

Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.

Fix the '!=' relation to match the definition of equality:
  A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.

This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.

The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.

Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
  A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.

Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.

Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
2021-10-24 06:55:54 +00:00
João Valverde e8800ff3c4 dfilter: Add a thin encapsulation layer for REs 2021-10-18 12:09:36 +00:00
João Valverde 2e048df011 dfilter: Improve error message for "matches"
Should be more obvious that this error is caused
by a string syntax error and not something else.
2021-10-18 12:09:36 +00:00
João Valverde a975d478ba dfilter: Require double-quoted strings with "matches"
Matches is a special case that looks on the RHS and tries
to convert every unparsed value to a string, regardless
of the LHS type. This is not how types work in the display
filter. Require double-quotes to avoid ambiguity, because
matches doesn't follow normal Wireshark display filter
type rules. It doesn't need nor benefit from the flexibility
provided by unparsed strings in the syntax.

For matches the RHS is always a literal strings except
if the RHS is also a field name, then it complains of an
incompatible type. This is confusing. No type can be compatible
because no type rules are ever considered. Every unparsed value is
a text string except if it happens to coincide with a field
name it also requires double-quoting or it throws a syntax error,
just to be difficult. We could remove this odd quirk but requiring
double-quotes for regular expressions is a better, more elegant
fix.

Before:
  Filter: tcp matches "udp"

  Constants:
  00000 PUT_PCRE	udp -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

  Filter: tcp matches udp

  Constants:
  00000 PUT_PCRE	udp -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

  Filter: tcp matches udp.srcport
  dftest: tcp and udp.srcport are not of compatible types.

  Filter: tcp matches udp.srcportt

  Constants:
  00000 PUT_PCRE	udp.srcportt -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

After:
  Filter: tcp matches "udp"

  Constants:
  00000 PUT_PCRE	udp -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

  Filter: tcp matches udp
  dftest: "udp" was unexpected in this context.

  Filter: tcp matches udp.srcport
  dftest: "udp.srcport" was unexpected in this context.

  Filter: tcp matches udp.srcportt
  dftest: "udp.srcportt" was unexpected in this context.

The error message could still be improved.
2021-10-17 22:53:36 +00:00
João Valverde 9d87c4712e dfilter: Fix parsing of value strings
If we have a STRING value in an expression and a numeric comparison
we must also check if it matches a value string before throwing
a type error.

Add appropriate tests to the test suite.

Fixes 4d2f469212.
2021-10-08 18:53:15 +01:00