Commit Graph

137 Commits

Author SHA1 Message Date
João Valverde 613331f07b dfilter: Disable flex debug trace for release builds
This omits the flex debug code in the binary if the build type is
RelMinSize or Release.

It replaces the "%option debug" stanza with the -d command line
option, to be able to configure the flex behaviour.
2023-01-09 04:03:19 +00:00
João Valverde 1861679e81 dfilter: Optimize some scanner patterns
Cleanup flex code. Optimize some patterns to avoid lookups
for field matches for values that are not legal field names.

Improve warning and add some comments.
2023-01-07 21:15:25 +00:00
João Valverde f37c7c4062 dfilter: Tweak representation for length-1 byte array
Make dfilter byte representation always use ':' for consistency.

Make 1 byte be represented as "XX:" with the colon suffix to
make it nonambiguous that is is a byte and not other type,
like a protocol.

The difference is can be seen in the following programs. In the
before representation it is not obvious at all that the second
"fc" value is a literal bytes value and not the value of the
protocol "fc", although it can be inferred from the lack of
a READ_TREE instruction. In the After we know that "fc:" must
be bytes and not a protocol.

Note that a leading colon is a syntactical expedient to say
"this value with any type is a literal value and not a protocol
field." A terminating colon is just a part of the dfilter
literal bytes syntax.

Before:

Filter: fc == :fc

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(fc <FT_PROTOCOL>)
   1 FVALUE(fc <FT_PROTOCOL>)

Instructions:
00000 READ_TREE		fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fc <FT_PROTOCOL>

After:

Filter: fc == :fc

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(fc <FT_PROTOCOL>)
   1 FVALUE(fc: <FT_PROTOCOL>)

Instructions:
00000 READ_TREE		fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fc: <FT_PROTOCOL>
2023-01-02 02:54:38 +00:00
João Valverde f5bfe89785 dfilter: Replace global variable 2023-01-02 01:19:51 +00:00
João Valverde 5d8f495233 dfilter: Minor flex clean up
Replace flex prefix to improve readability.

Remove two no-longer-needed workarounds to suppress warnings.
2023-01-02 01:19:26 +00:00
João Valverde d3d06c2552 dftest: Add debug command-line options 2022-12-30 13:42:26 +00:00
João Valverde 1400d92724 dfilter: Add compilation warning for ambiguous syntax
$ dfilter 'frame contains fc'
    Filter: frame contains fc

    Warning: Interpreting "fc" as "Fibre Channel". Consider writing :fc or .fc.
    (...)
2022-12-29 23:48:56 +00:00
João Valverde 77ef21f86e dfilter: Replace unparsed lexical type and simplify grammar
Remove unparsed lexical type and replace it with identifier
and constant. This separation is still necessary to differentiate
names (fields and function) from literals that look like names
but it has some advantages to do it at the lexical level.

The main advantage is a much cleaner and simplified grammar,
because we only have a single token type for field names, without
any loss of generality (the same name is valid for fields and
function names for example).

The CONSTANT token type is necessary to be different from literal
to provide errors for function rules.
2022-12-29 18:28:54 +00:00
João Valverde 596e0b41d1 dfilter: Change two scanner patterns to camel case 2022-12-26 07:27:40 +00:00
João Valverde b9a5009cb2 dfilter: Clean up scanner code
Clean up some issues flagged by a linter.

Remove hyphen from pattern names and remove an unused start condition.
2022-12-24 15:51:36 +00:00
João Valverde 3938b406fb dfilter: Refactor error location tracking
Remove duplicate location struct by adding a new header.

Pass around a structure instead of a pointer.
2022-12-23 18:23:06 +00:00
João Valverde a0d77e9329 dfilter: Return an error object instead of string
Return an struct containing error information. This simplifies
the interface to more easily provide richer diagnostics in the future.

Add an error code besides a human-readable error string to allow
checking programmatically for errors in a robust manner. Currently
there is only a generic error code, it is expected to increase
in the future.

Move error location information to the struct. Change callers and
implementation to use the new interface.
2022-11-28 15:46:44 +00:00
João Valverde 79c3a77752 Add macros to control lemon diagnostics
Rename flex macros using parenthesis (mostly a style issue):

DIAG_OFF_FLEX -> DIAG_OFF_FLEX()
DIAG_ON_FLEX  -> DIAG_ON_FLEX()

Use the same kind of construct with lemon generated code using
DIAG_OFF_LEMON() and DIAG_ON_LEMON(). Use %include and %code
directives to enforce the desired order with generated code
in the middle in between pragmas.

Fix a clang-specific pragma to use DIAG_OFF_CLANG().

DIAG_OFF(unreachable-code) -> DIAG_OFF_CLANG(unreachable-code).

Apparently GCC is ignoring the -Wunreachable flag, that's why
it did not trigger an unknown pragma warning. From [1}:

  The -Wunreachable-code has been removed, because it was unstable: it
  relied on the optimizer, and so different versions of gcc would warn
  about different code.  The compiler still accepts and ignores the
  command line option so that existing Makefiles are not broken.  In some
  future release the option will be removed entirely. - Ian

[1] https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html
2022-11-20 10:11:27 +00:00
Peter Wu df478a365d dfilter: treat carriage returns as whitespace
Fixes #18595
2022-11-07 01:00:50 +00:00
João Valverde 0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
João Valverde 0662a3f6ac dfilter: Amend a numeric pattern in the scanner
We amend the :<numeric> pattern to not eat the leading
colon. Because the colon can be part of the value (with IPv6 addresses
for example) we want to avoid doing that.

IPv6 addresses are covered by their own rules but this removes the
requirement in the future to handle any special cases and avoids
surprises.

For this reason the colon-prefix syntax is already explicitly defined to
work only for byte arrays and there is currently no universal
syntax for all literal values or even all numbers.

Other numbers can keep using the lexical type "unparsed".

```
run/dftest "_ws.ftypes.uint8 == :fd"
Filter: _ws.ftypes.uint8 == :fd
dftest: ":fd" is not a valid number.
	_ws.ftypes.uint8 == :fd
	                    ^~~

run/dftest "_ws.ftypes.uint8 == fd"
Filter: _ws.ftypes.uint8 == fd
dftest: "fd" is not a valid number.
	_ws.ftypes.uint8 == fd
	                    ^~

run/dftest "_ws.ftypes.uint8 == 0xfd"
Filter: _ws.ftypes.uint8 == 0xfd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.uint8 <FT_UINT8>)
   1 FVALUE(253 <FT_UINT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.uint8 <FT_UINT8> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == 253 <FT_UINT8>
00003 RETURN

run/dftest "_ws.ftypes.bytes == fd"
Filter: _ws.ftypes.bytes == fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN

run/dftest "_ws.ftypes.bytes == :fd"
Filter: _ws.ftypes.bytes == :fd

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.bytes <FT_BYTES>)
   1 FVALUE(fd <FT_BYTES>)

Instructions:
00000 READ_TREE		_ws.ftypes.bytes <FT_BYTES> -> reg#0
00001 IF_FALSE_GOTO	3
00002 ANY_EQ		reg#0 == fd <FT_BYTES>
00003 RETURN
```
2022-10-08 09:51:49 +00:00
João Valverde 14f5121c4a dfilter: Remove problematic <...> literal syntax
The <...> syntax for literals, intended to be as generic as
possible, unintentionally introduced an ambiguity with the
relational expression "a < b or a > c".

Literals are values like numbers, bytes, IPv6 addresses or, one
could imagine, UNC paths for example, if an FT_UNC type were to
be added in the future.

We could use a new unique symbol like @...@ but the <...>
syntax is very recent and may not be necessary with ":xxx" so
just remove it.

A byte array can be explicitly declared by prefixing with a colon. It
is not as generic but the main ambiguity that this new syntax attempted
to solve is bytes vs protocol names. We don't want to introduce a new
reserved symbol for now, until other requirements if any are more clear.

Fixes #18418.
2022-10-08 09:51:49 +00:00
João Valverde 0fc81c21b2 dfilter: Cleanup scanner value setters 2022-07-04 22:15:40 +00:00
João Valverde b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
João Valverde efbe699756 dfilter: Remove STTYPE_RANGE_NODE
STTYPE_RANGE_NODE is just a lexical token, it is
not used withi the syntax tree so remove it.
2022-06-25 16:06:48 +01:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde 354e0d7edf dfilter: Add support for unicode escape sequences
Add support for entering unicode codepoints as \uNNNN or \uNNNNNNNN
for strings and charconsts (following the C standard).
2022-06-21 16:54:16 +01:00
João Valverde 47348ae598 dfilter: Add support for literal strings with null bytes
Before:
    Filter: frame matches "abc\x00def"
    dftest: \x00 (NUL byte) cannot be used with a regular string.
    	frame matches "abc\x00def"
    	                  ^~~~
    Filter: _ws.ftypes.string == "a string with a \0 byte"
    dftest: \0 (NUL byte) cannot be used with a regular string.
    	_ws.ftypes.string == "a string with a \0 byte"
    	                                      ^~

After:
    Filter: frame matches "abc\x00def"

    Syntax tree:
     0 TEST_MATCHES:
       1 FIELD(frame)
       1 PCRE(abc\0def)

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_MATCHES	reg#0 matches abc\0def
    00003 RETURN

    Filter: _ws.ftypes.string == "a string with a \0 byte"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string)
       1 FVALUE("a string with a \0 byte" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "a string with a \0 byte" <FT_STRING>
    00003 RETURN

Fixes issue #16156.
2022-06-21 15:10:08 +00:00
João Valverde b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde af878388fe dfilter: Fix scanning of strings
The code was ignoring a SCAN_FAILED return value.
2022-04-15 22:51:15 +01:00
João Valverde 09696f1762 Try to fix a narrowing warning
"C:\Development\wsbuild64\Wireshark.sln" (default target) (1) ->
"C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj.metaproj" (default target) (18) ->
"C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj" (default target) (108) ->
       (ClCompile target) ->
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267: '+=': conversion from 'size_t' to 'int
       ', possible loss of data [C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj]
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267:         state->location.col_start += sta
       te->location.col_len; [C:\Development\wsbuild64\epan\dfilter\dfilter.vcxproj]
C:/Development/wireshark/epan/dfilter/scanner.l(463,54): warning C4267:
                           ^ (compiling source file C:\Development\wsbuild64\epan\dfilter\scanner.c) [C:\Development\ws
       build64\epan\dfilter\dfilter.vcxproj]
2022-04-11 22:23:13 +01:00
João Valverde 4d9470e7dd dfilter: Add location tracking to scanner and use it to report errors
Add location tracking as a column offset and length from offset
to the scanner. Our input is a single line only so we don't need
to track line offset.

Record that information in the syntax tree. Return the error location
in dfilter_compile(). Use it in dftest to mark the location of the
error in the filter string. Later it would be nice to use the location
in the GUI as well.

$ dftest "ip.proto == aaaaaa and tcp.port == 123"
Filter: ip.proto == aaaaaa and tcp.port == 123
dftest: "aaaaaa" cannot be found among the possible values for ip.proto.
	ip.proto == aaaaaa and tcp.port == 123
	            ^~~~~~
2022-04-10 10:09:51 +01:00
João Valverde da19379eb5 dfilter: Create the syntax node in the scanner and pass that
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
2022-04-10 09:54:03 +01:00
João Valverde cc5726b63f dfilter: Remove leading colon special meaning
Instead of saying a leading colon will make any token a literal
value, say it is part of the syntax of bytes arrays. This is
useful to write bytes without a separator, and other potentially
ambiguous formats.

The restriction in meaning to bytes and simple numeric values
should make the rules for handling a leading colon (specifically
ommiting it or not) saner without much loss of functionality.
2022-04-07 00:16:07 +01:00
João Valverde a6f37323e6 dfilter: Clean up lexical scanning 2022-04-06 18:11:27 +01:00
João Valverde 8108e67de7 dfilter: Fix memory leak with leading colon
When retrying fvalue_from_literal() we were leaking the error
message string.

Refactor the code to avoid the retry. This assumes the only
valid use of a leading ':' with a literal is for an IPv6 address.

Bytes with leading ':' are supported but the colon is skipped,
so the parser doesn't see it.

Fixes df0fc8b517.
2022-04-06 18:09:12 +01:00
João Valverde 12c8cc32f0 dfilter: Fix parsing of some IPv6 compressed addresses
Fix parsing of some IPv6 addresses and add tests.

Also pass tokens as unparsed unless the user was specfic about
the semantic type. For example the IPv4 address 1.1.1.1 is also a
valid field, but 1.1.1.1/128 is not (because of the slash). However
choose not to enforce the distinction in the lexical scanner and pass
everything as unparsed unless the meaning is explicit in the syntax
with leading dot, colon, or between angle branckets.
2022-04-06 10:10:04 +01:00
João Valverde 7ed5d5036e dfilter: restore support for identifiers using hyphen
Restores support for filters such as "mac-lte", that was broken
in 330d408328.

This means we are not able to support arithmetic expressions with binary
minus without spaces.

$ dftest 'tcp.port == 1-2'
dftest: "1-2" is not a valid number.
2022-04-05 15:38:20 +01:00
João Valverde 330d408328 dfilter: Allow arithmetic expressions without spaces
To allow an arithmetic expressions without spaces, such as "1+2",
we cannot match the expression in other lexical rules using "+". Because
of longest match this becomes the token LITERAL or UNPARSED with semantic value
"1+2". The same goes for all the other arithmetic operators.

So we need to remove [+-*/%] from "word chars" and add very specific
patterns (that won't mistakenly match an arithmetic expression) for
those literal or unparsed tokens we want to support using these characters.
The plus was not a problem but right slash is used for CIDR, minus for
mac address separator, etc.

There are still some corner case. 11-22-33-44-55-66 is a mac
address and not the arithmetic expression with six terms "eleven
minus twenty two minus etc." (if we ever support more than two terms
in the grammar, which we don't currently).

We lift some patterns from the flex manual to match on IPv4 and
IPv6 (ugly) and add MAC address.

Other hypothetical literal lexical values using [+-*/%] are already
supported enclosed in angle brackets but the cases of MAC/IPv4/IPv6 are
are very common and moreover we need to do the utmost to not break backward
compatibily here.

Before:
    $ dftest "_ws.ftypes.int32 == 1+2"
    dftest: "1+2" is not a valid number.

After:
    $ dftest "_ws.ftypes.int32 == 1+2"
    Filter: _ws.ftypes.int32 == 1+2

    Instructions:
    00000 READ_TREE		_ws.ftypes.int32 -> reg#0
    00001 IF_FALSE_GOTO	4
    00002 ADD		1 <FT_INT32> + 2 <FT_INT32> -> reg#1
    00003 ANY_EQ		reg#0 == reg#1
    00004 RETURN
2022-04-04 20:28:55 +00:00
João Valverde 8bc214b5bb dfilter: Add remaining arithmetic integer ops 2022-03-31 16:49:42 +01:00
João Valverde 260942e170 dfilter: Refactor macro tree references
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.

Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.

The reference is synctatically validated at compile time.

There are two main advantages to this implementation (and a couple of
minor ones):

The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).

The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).

Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).

If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.

Fixes #17599.
2022-03-29 12:36:31 +00:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde 8983dda8f2 dfilter: Deprecate "~=" (any_ne)
The representation "~= has been superseded by "!==" with the same
meaning, making it superfluous and somewhat confusing. Deprecate
"~=" and recommend "!==" instead.
2022-03-09 11:28:39 +00:00
João Valverde 6d520addd1 dfilter: Add special syntax for literals and names
The syntax for protocols and some literals like numbers
and bytes/addresses can be  ambiguous. Some protocols can
be parsed as a literal, for example the protocol "fc"
(Fibre Channel) can be parsed as 0xFC.

If a numeric protocol is registered that will also take
precedence over any literal, according to the current
rules, thereby breaking numerical comparisons to that
number. The same for an hypothetical protocol named "true",
etc.

To allow the user to disambiguate this meaning introduce
new syntax.

Any value prefixed with ':' or enclosed in <,> will be treated
as a literal value only. The value :fc or <fc> will always
mean 0xFC, under any context. Never a protocol whose filter
name is "fc".

Likewise any value prefixed with a dot will always be parsed
as an identifier (protocol or protocol field) in the language.
Never any literal value parsed from the token "fc".

This allows the user to be explicit about the meaning,
and between the two explicit methods plus the ambiguous one
it doesn't completely break any one meaning.

The difference can be seen in the following two programs:

    Filter: frame == fc

    Constants:

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF-FALSE-GOTO	5
    00002 READ_TREE		fc -> reg#1
    00003 IF-FALSE-GOTO	5
    00004 ANY_EQ		reg#0 == reg#1
    00005 RETURN

    --------

    Filter: frame == :fc

    Constants:
    00000 PUT_FVALUE	fc <FT_PROTOCOL> -> reg#1

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF-FALSE-GOTO	3
    00002 ANY_EQ		reg#0 == reg#1
    00003 RETURN

The filter "frame == fc" is the same as "filter == .fc",
according to the current heuristic, except the first form
will try to parse it as a literal if the name does not
correspond to any registered protocol.

By treating a leading dot as a name in the language we
necessarily disallow writing floats with a leading dot. We
will also disallow writing with an ending dot when using
unparsed values. This is a backward incompatibility but has
the happy side effect of making the expression {1...2}
unambiguous.

This could either mean "1 .. .2" or "1. .. 2". If we require
a leading and ending digit then the meaning is clear:
    1.0..0.2 -> 1.0 .. 0.2

Fixes #17731.
2022-03-05 11:10:54 +00:00
João Valverde b5d74c69a7 dfilter: Fix error message with non printable ASCII
Before:
    Filter: http.user_agent == açaí
    dftest: "�" was unexpected in this context.

After:
    Filter: http.user_agent == açaí
    dftest: Non-printable ASCII characters may only appear inside double-quotes.

Related with #17770.
2022-02-19 17:49:29 +00:00
João Valverde d8b7d1f821 dfilter: Add aliases "any_eq" and "all_ne" 2021-12-22 14:32:32 +00:00
João Valverde 8b23dd3a3c dfilter: Add an "all equal" operator
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.

The symbol chosen for "all_eq" is "===".
2021-12-22 14:32:32 +00:00
João Valverde fbfb4959ae dfilter: Better representation for charconst 2021-11-27 18:38:22 +00:00
João Valverde 35ad2e85c8 dfilter: Free a scanner string 2021-11-24 10:06:19 +00:00
João Valverde eb8c3169e7 dfilter: Clean up charconst error message 2021-11-24 09:38:58 +00:00
João Valverde 943c282009 dfilter: Parse character constants in lexer
Invalid character constants should be handled in the lexical scanner.

Todo: See if some code could be shared to parse double quoted strings.

It also fixes some unintuitive type coercions to string. Character
constants should be treated as characters, or maybe integers, or
maybe even throw an invalid comparison error, but coverting to a
literal string or byte array is surprising and not particularly
useful:
  '\xFF' -> "'\xFF'" (equals)
  '\xFF' -> "FF"     (contains)

Before:

    Filter: http.request.method contains "\x63"

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method contains '\x63'

    Constants:
    00000 PUT_FVALUE	"63" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == "\x63"

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == '\x63'

    Constants:
    00000 PUT_FVALUE	"'\\x63'" <FT_STRING> -> reg#1
    (...)

After:

    Filter: http.request.method contains '\x63'

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)

    Filter: http.request.method == '\x63'

    Constants:
    00000 PUT_FVALUE	"c" <FT_STRING> -> reg#1
    (...)
2021-11-24 08:40:20 +00:00
João Valverde 72c5efea1b dfilter: Reject invalid character escape sequences
For double quoted strings. This is consistent with single quote
character constants and the C standard. It also avoids common
mistakes where the superfluous backslash is silently suppressed.
2021-11-23 16:48:02 +00:00
João Valverde 9ca27643fa dfilter: Support more C escape sequences in string literals
Before:

  Filter: http.request.method == "\tHEAD"

  Constants:
  00000 PUT_FVALUE	"tHEAD" <FT_STRING> -> reg#1
  (...)

  Filter: http.request.method == "\uHEAD"

  Constants:
  00000 PUT_FVALUE	"uHEAD" <FT_STRING> -> reg#1
  (...)

After:

  Filter: http.request.method == "\tHEAD"

  Constants:
  00000 PUT_FVALUE	"\x09HEAD" <FT_STRING> -> reg#1
  (...)

  Filter: http.request.method == "\uHEAD"

  Constants:
  00000 PUT_FVALUE	"uHEAD" <FT_STRING> -> reg#1
  (...)
2021-10-31 20:33:31 +00:00