Commit Graph

138 Commits

Author SHA1 Message Date
João Valverde 313fed6db0 dftest: Add --types option 2023-01-11 01:00:41 +00:00
João Valverde 613331f07b dfilter: Disable flex debug trace for release builds
This omits the flex debug code in the binary if the build type is
RelMinSize or Release.

It replaces the "%option debug" stanza with the -d command line
option, to be able to configure the flex behaviour.
2023-01-09 04:03:19 +00:00
João Valverde 7641ba7416 dftest: More code cleanups and enhancements 2023-01-07 19:16:16 +00:00
João Valverde 522c74b734 dftest: More CLI options and improve output format 2023-01-05 20:26:42 +00:00
João Valverde f5bfe89785 dfilter: Replace global variable 2023-01-02 01:19:51 +00:00
João Valverde 5d8f495233 dfilter: Minor flex clean up
Replace flex prefix to improve readability.

Remove two no-longer-needed workarounds to suppress warnings.
2023-01-02 01:19:26 +00:00
João Valverde d3d06c2552 dftest: Add debug command-line options 2022-12-30 13:42:26 +00:00
João Valverde 1400d92724 dfilter: Add compilation warning for ambiguous syntax
$ dfilter 'frame contains fc'
    Filter: frame contains fc

    Warning: Interpreting "fc" as "Fibre Channel". Consider writing :fc or .fc.
    (...)
2022-12-29 23:48:56 +00:00
João Valverde 77ef21f86e dfilter: Replace unparsed lexical type and simplify grammar
Remove unparsed lexical type and replace it with identifier
and constant. This separation is still necessary to differentiate
names (fields and function) from literals that look like names
but it has some advantages to do it at the lexical level.

The main advantage is a much cleaner and simplified grammar,
because we only have a single token type for field names, without
any loss of generality (the same name is valid for fields and
function names for example).

The CONSTANT token type is necessary to be different from literal
to provide errors for function rules.
2022-12-29 18:28:54 +00:00
João Valverde b19bed43d1 dfilter: Allow constants as the first or only argument to min/max
The strategy here is to delay resolving literals to values until
we have looked at the entire argument list.

Also we will try to commute the relation in a comparison if
we do not have a type for the return value of the function,
like any other constant.

Before:

    Filter: max(1,_ws.ftypes.int8) == 1
    dftest: Argument '1' is not valid for max()
    	max(1,_ws.ftypes.int8) == 1
    	    ^

After:

    Filter: max(1,_ws.ftypes.int8) == 1

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FUNCTION(max#2):
         2 FVALUE(1 <FT_INT8>)
         2 FIELD(_ws.ftypes.int8 <FT_INT8>)
       1 FVALUE(1 <FT_INT8>)

    Instructions:
    00000 STACK_PUSH	1 <FT_INT8>
    00001 READ_TREE		_ws.ftypes.int8 <FT_INT8> -> reg#1
    00002 IF_FALSE_GOTO	3
    00003 STACK_PUSH	reg#1
    00004 CALL_FUNCTION	max(reg#1, 1 <FT_INT8>) -> reg#0
    00005 STACK_POP	2
    00006 IF_FALSE_GOTO	8
    00007 ANY_EQ		reg#0 == 1 <FT_INT8>
    00008 RETURN
2022-12-27 02:21:06 +00:00
João Valverde 3938b406fb dfilter: Refactor error location tracking
Remove duplicate location struct by adding a new header.

Pass around a structure instead of a pointer.
2022-12-23 18:23:06 +00:00
João Valverde b116ccd6d5 dfilter: Replace compile booleans arguments with a bit flag 2022-11-30 17:36:17 +00:00
João Valverde 84e75be5c6 dfilter: Add optimization flag
When we are just testing code to see if it compiles performing
optimizations is wasteful. Add an option to disable them.
2022-11-30 17:36:17 +00:00
João Valverde 93814ef740 dfilter: Always set error pointer in case of failure 2022-11-30 15:00:34 +00:00
João Valverde a0d77e9329 dfilter: Return an error object instead of string
Return an struct containing error information. This simplifies
the interface to more easily provide richer diagnostics in the future.

Add an error code besides a human-readable error string to allow
checking programmatically for errors in a robust manner. Currently
there is only a generic error code, it is expected to increase
in the future.

Move error location information to the struct. Change callers and
implementation to use the new interface.
2022-11-28 15:46:44 +00:00
João Valverde b83658d8a4 dfilter: Add suport for raw addressing with references
Extends raw adressing syntax to wok with references. The syntax
is
    @field1 == ${@field2}

This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
2022-10-31 21:02:39 +00:00
João Valverde 0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
João Valverde 0583b76204 dfilter: Remove unused data structure 2022-10-31 21:02:39 +00:00
João Valverde b10db887ce dfilter: Remove unparsed syntax type and RHS literal bias
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.

We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).

This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.

Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
2022-07-02 11:18:20 +01:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde fab32ea0cb dfilter: Allow arithmetic expressions as function arguments
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):

Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01

Syntax tree:
 0 TEST_LT:
   1 OP_DIVIDE:
     2 FUNCTION(abs#1):
       3 OP_SUBTRACT:
         4 FIELD(_ws.ftypes.double)
         4 FVALUE(1 <FT_DOUBLE>)
     2 FUNCTION(max#2):
       3 FUNCTION(abs#1):
         4 FIELD(_ws.ftypes.double)
       3 FUNCTION(abs#1):
         4 FVALUE(1 <FT_DOUBLE>)
   1 FVALUE(0.01 <FT_DOUBLE>)

Instructions:
00000 READ_TREE		_ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO	3
00002 SUBRACT		reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH	reg#2
00004 CALL_FUNCTION	abs(reg#2) -> reg#0
00005 STACK_POP	1
00006 IF_FALSE_GOTO	24
00007 READ_TREE		_ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO	9
00009 STACK_PUSH	reg#1
00010 CALL_FUNCTION	abs(reg#1) -> reg#4
00011 STACK_POP	1
00012 IF_FALSE_GOTO	13
00013 STACK_PUSH	reg#4
00014 STACK_PUSH	1 <FT_DOUBLE>
00015 CALL_FUNCTION	abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP	1
00017 IF_FALSE_GOTO	18
00018 STACK_PUSH	reg#5
00019 CALL_FUNCTION	max(reg#5, reg#4) -> reg#3
00020 STACK_POP	2
00021 IF_FALSE_GOTO	24
00022 DIVIDE		reg#0 / reg#3 -> reg#6
00023 ANY_LT		reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN

We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.

It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
2022-04-18 17:10:31 +01:00
João Valverde cb2f085f14 dfilter: Add max() and min() functions
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.

Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
2022-04-14 13:07:41 +00:00
João Valverde 8746eea297 dfilter: Try to resolve field reference instead of using a heuristic
Instead of using a heuristic to decide whether the form ${...} is
a macro or not, try to resolve the name to a registered protocol
field and use that instead.

This increases somewhat the surface for clobbering existing macro
names with new field registrations but we'll cross that bridge when
we get to it.

Rejecting protocol field types reduces this probability again but it
may not be intuitive to the user trying to mistakenly use a reference
to a protocol why it is parsed as a macro. The reasons for rejecting
FT_PROTOCOL types as not interesting field references are not
very strong but it seems reasonable.

$ dftest 'frame.number != ${frame.number}'
Filter: frame.number != ${frame.number}

Instructions:
00000 READ_TREE		frame.number -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_REFERENCE	${frame.number} -> reg#1
00003 IF_FALSE_GOTO	5
00004 ALL_NE		reg#0 != reg#1
00005 RETURN

$ dftest 'frame != ${frame}'
dftest: macro 'frame' does not exist
2022-04-12 14:03:18 +00:00
João Valverde 2f02cd6e19 dfilter: Handle missing error location more gracefully
If we don't have an offset, don't print anything with underline.

Also it can underline filters using macros correctly now.

$ tshark -Y 'ip and ${private_ipv4:ip.sr}' -r /dev/null
tshark: Left side of "==" expression must be a field or function, not "ip.sr".
    ip and ip.sr == 192.168.0.0/16 or ip.sr == 172.16.0.0/12 or ip.sr == 10.0.0.0/8
           ^~~~~
2022-04-11 21:03:06 +00:00
João Valverde 4d9470e7dd dfilter: Add location tracking to scanner and use it to report errors
Add location tracking as a column offset and length from offset
to the scanner. Our input is a single line only so we don't need
to track line offset.

Record that information in the syntax tree. Return the error location
in dfilter_compile(). Use it in dftest to mark the location of the
error in the filter string. Later it would be nice to use the location
in the GUI as well.

$ dftest "ip.proto == aaaaaa and tcp.port == 123"
Filter: ip.proto == aaaaaa and tcp.port == 123
dftest: "aaaaaa" cannot be found among the possible values for ip.proto.
	ip.proto == aaaaaa and tcp.port == 123
	            ^~~~~~
2022-04-10 10:09:51 +01:00
João Valverde da19379eb5 dfilter: Create the syntax node in the scanner and pass that
Revert to passing a syntax node from the lexical scanner to the grammar
parser. Using a union is not having a discernible advantage and requires
duplicating a lot of properties of syntax nodes.
2022-04-10 09:54:03 +01:00
João Valverde 7429832db4 Fix a log message 2022-04-06 23:42:04 +01:00
João Valverde a6f37323e6 dfilter: Clean up lexical scanning 2022-04-06 18:11:27 +01:00
João Valverde c98df5eef5 dfilter: Print syntax tree using dftest + format enhancements
Add argument to dfilter_compile_real() to save syntax tree text
representation.

Use it with dftest to print syntax tree.

Misc debug output format improvements.
2022-04-05 12:04:37 +01:00
João Valverde 8bc214b5bb dfilter: Add remaining arithmetic integer ops 2022-03-31 16:49:42 +01:00
João Valverde 5cd0e4cc97 dfilter: Fix use after free with references
By the time we are using the reference fvalue the tree may have gone
away and with it the fvalue. We need to duplicate the reference
fvalues and take ownership of the memory.
2022-03-30 14:05:22 +01:00
João Valverde 260942e170 dfilter: Refactor macro tree references
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.

Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.

The reference is synctatically validated at compile time.

There are two main advantages to this implementation (and a couple of
minor ones):

The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).

The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).

Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).

If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.

Fixes #17599.
2022-03-29 12:36:31 +00:00
João Valverde d2907d91c0 dfilter: Add more logging for bytecode 2022-03-28 17:59:07 +01:00
João Valverde 9ee9b40b64 dfilter: Store expanded text 2022-03-28 17:22:01 +01:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde 631cf34f0c dfilter: Use a function pointer array to free registers 2022-03-21 18:43:36 +00:00
João Valverde 94d909103e dfilter: Remove DFVM constant initialization 2022-03-21 17:09:43 +00:00
João Valverde 22f3d87a8f dfilter: Use singly linked list for registers
Replace calls to list append with list prepend where applicable.
2022-03-21 11:47:19 +00:00
João Valverde df0fc8b517 dfilter: Try to be more flexible with leading colons
For an expression starting with a colon (a literal) try to parse
the value with and without colon. This avoids excluding some
valid representations like the IPv6 address "::1".
2022-03-05 11:10:54 +00:00
João Valverde 6d520addd1 dfilter: Add special syntax for literals and names
The syntax for protocols and some literals like numbers
and bytes/addresses can be  ambiguous. Some protocols can
be parsed as a literal, for example the protocol "fc"
(Fibre Channel) can be parsed as 0xFC.

If a numeric protocol is registered that will also take
precedence over any literal, according to the current
rules, thereby breaking numerical comparisons to that
number. The same for an hypothetical protocol named "true",
etc.

To allow the user to disambiguate this meaning introduce
new syntax.

Any value prefixed with ':' or enclosed in <,> will be treated
as a literal value only. The value :fc or <fc> will always
mean 0xFC, under any context. Never a protocol whose filter
name is "fc".

Likewise any value prefixed with a dot will always be parsed
as an identifier (protocol or protocol field) in the language.
Never any literal value parsed from the token "fc".

This allows the user to be explicit about the meaning,
and between the two explicit methods plus the ambiguous one
it doesn't completely break any one meaning.

The difference can be seen in the following two programs:

    Filter: frame == fc

    Constants:

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF-FALSE-GOTO	5
    00002 READ_TREE		fc -> reg#1
    00003 IF-FALSE-GOTO	5
    00004 ANY_EQ		reg#0 == reg#1
    00005 RETURN

    --------

    Filter: frame == :fc

    Constants:
    00000 PUT_FVALUE	fc <FT_PROTOCOL> -> reg#1

    Instructions:
    00000 READ_TREE		frame -> reg#0
    00001 IF-FALSE-GOTO	3
    00002 ANY_EQ		reg#0 == reg#1
    00003 RETURN

The filter "frame == fc" is the same as "filter == .fc",
according to the current heuristic, except the first form
will try to parse it as a literal if the name does not
correspond to any registered protocol.

By treating a leading dot as a name in the language we
necessarily disallow writing floats with a leading dot. We
will also disallow writing with an ending dot when using
unparsed values. This is a backward incompatibility but has
the happy side effect of making the expression {1...2}
unambiguous.

This could either mean "1 .. .2" or "1. .. 2". If we require
a leading and ending digit then the meaning is clear:
    1.0..0.2 -> 1.0 .. 0.2

Fixes #17731.
2022-03-05 11:10:54 +00:00
João Valverde 8b23dd3a3c dfilter: Add an "all equal" operator
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.

The symbol chosen for "all_eq" is "===".
2021-12-22 14:32:32 +00:00
João Valverde c5a19582e4 epan: Convert to use stdio.h from GLib
Replace:
    g_snprintf() -> snprintf()
    g_vsnprintf() -> vsnprintf()
    g_strdup_printf() -> ws_strdup_printf()
    g_strdup_vprintf() -> ws_strdup_vprintf()

This is more portable, user-friendly and faster on platforms
where GLib does not like the native I/O.

Adjust the format string to use macros from intypes.h.
2021-12-19 19:29:53 +00:00
João Valverde f5f8d9ebb6 dfilter: Fix token associativity
TEST_EQ and TEST_NE are unused. Replace by the correct values
and add missing token to string representations.
2021-12-13 01:24:18 +00:00
João Valverde 7cffcfa835 dfilter: Remove a default switch case 2021-12-12 10:16:27 +00:00
João Valverde 647decd509 dfilter: Avoid double strdup to save token value
Store the lval token value instead.
2021-12-01 19:42:51 +00:00
João Valverde 3e0806ca09 dfilter: Remove dfilter_fail_parse()
Instead of requiring a special error function in the parser
just set the syntax_error flag if an error occurs, in any stage
of compilation. Outside of the parser loop it will not be used
but that is fine.
2021-11-30 19:52:05 +00:00
João Valverde 3c7894e2a0 dfilter: Add compilation result to log output
Add result output to console log, in addition to intermediate debug
information. This allows tracing the result using the log only.
2021-11-16 13:52:30 +00:00
João Valverde e7ecc9b9e5 dfilter: Clean up error format and exception code
Misc code cleanups. Add some extra stnode functions for increased type
safety. Fix a constness issue with df_lval_value().
2021-11-10 03:18:50 +00:00
João Valverde 146a840ad1 dfilter: Move a constructor to the grammar file 2021-11-06 11:45:21 +00:00