Commit Graph

101 Commits

Author SHA1 Message Date
João Valverde b83658d8a4 dfilter: Add suport for raw addressing with references
Extends raw adressing syntax to wok with references. The syntax
is
    @field1 == ${@field2}

This requires replicating the logic to load field references, but
using raw values instead. We use separate hash tables for that,
namely "references" vs "raw_references".
2022-10-31 21:02:39 +00:00
João Valverde 0853ddd1cb dfilter: Add support for raw (bytes) addressing mode
This adds new syntax to read a field from the tree as bytes, instead
of the actual type. This is a useful extension for example to match
matformed strings that contain unicode replacement characters. In
this case it is not possible to match the raw value of the malformed
string field. This extension fills this need and is generic enough
that it should be useful in many other situations.

The syntax used is to prefix the field name with "@". The following
artificial example tests if the HTTP user agent contains a particular
invalid UTF-8 sequence:

    @http.user_agent == "Mozill\xAA"

Where simply using "http.user_agent" won't work because the invalid byte
sequence will have been replaced with U+FFFD.

Considering the following programs:

    $ dftest '_ws.ftypes.string == "ABC"'
    Filter: _ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <FT_STRING>)
       1 FVALUE("ABC" <FT_STRING>)

    Instructions:
    00000 READ_TREE		_ws.ftypes.string <FT_STRING> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == "ABC" <FT_STRING>
    00003 RETURN

    $ dftest '@_ws.ftypes.string == "ABC"'
    Filter: @_ws.ftypes.string == "ABC"

    Syntax tree:
     0 TEST_ANY_EQ:
       1 FIELD(_ws.ftypes.string <RAW>)
       1 FVALUE(41:42:43 <FT_BYTES>)

    Instructions:
    00000 READ_TREE		@_ws.ftypes.string <FT_BYTES> -> reg#0
    00001 IF_FALSE_GOTO	3
    00002 ANY_EQ		reg#0 == 41:42:43 <FT_BYTES>
    00003 RETURN

In the second case the field has a "raw" type, that equates directly to
FT_BYTES, and the field value is read from the protocol raw data.
2022-10-31 21:02:39 +00:00
João Valverde 31a0147daa dfilter: Pass a value by reference
The lifetime of the reference is longer than the runtime so avoid
an unecessary fvalue dup.
2022-10-31 21:02:39 +00:00
Alexis La Goutte bd28c19ad6 dvfm: Fix -Wmissing-prototypes
dfvm.c:206:1: warning: no previous prototype for function 'dfvm_value_tostr'
dfvm.c:550:1: warning: no previous prototype for function 'filter_finfo_fvalues'
dfvm.c:645:1: warning: no previous prototype for function 'filter_refs_fvalues'
2022-07-15 13:45:52 +00:00
João Valverde 4c975b770e dfilter: Improve compatibility of integer types
Before:

$ dftest '_ws.ftypes.int64 == _ws.ftypes.int8'
Filter: _ws.ftypes.int64 == _ws.ftypes.int8
dftest: _ws.ftypes.int64 and _ws.ftypes.int8 are not of compatible types.
	_ws.ftypes.int64 == _ws.ftypes.int8
	                    ^~~~~~~~~~~~~~~

After:

$ dftest '_ws.ftypes.int64 == _ws.ftypes.int8'
Filter: _ws.ftypes.int64 == _ws.ftypes.int8

Syntax tree:
 0 TEST_ANY_EQ:
   1 FIELD(_ws.ftypes.int64 <FT_INT64>)
   1 FIELD(_ws.ftypes.int8 <FT_INT8>)

Instructions:
00000 READ_TREE		_ws.ftypes.int64 <FT_INT64> -> reg#0
00001 IF_FALSE_GOTO	5
00002 READ_TREE		_ws.ftypes.int8 <FT_INT8> -> reg#1
00003 IF_FALSE_GOTO	5
00004 ANY_EQ		reg#0 === reg#1
00005 RETURN
2022-07-14 20:12:30 +00:00
João Valverde f68f172454 dfilter: Remove a debug message
Still too noisy even with noisy level.
2022-07-13 16:06:28 +00:00
João Valverde 6c8a8d7960 dfilter: Fix dfvm code string
All/any equal have their own symbols for operators so cannot
be handled in the same switch case.

Other comparisons don't have different symbols for any/all.
2022-07-13 00:37:12 +01:00
João Valverde 5e3a7e9ab8 dfilter: Small optimization for "not all zero" code
Remove extra NOT instruction. Also remove unused ANY_ZERO opcode.
2022-07-05 09:58:43 +01:00
João Valverde a877f2d5f3 dfilter: Allow existence check for slices
Allow checking if a slice exists. The result is true if the
slice has length greater than zero.

The len() function is implemented as a DFVM instruction instead.
The semantics are the same.
2022-07-04 22:45:14 +00:00
João Valverde eb8acd088e dfilter: Rename dfvm opcodes with a namespace prefix 2022-07-02 11:46:45 +01:00
Roland Knall 8bdff72625 dfilter: Fix undefined dereference and add null check
A value of ref could be accessed undefined and add additional
checks to ensure, that refs_array actually contains data or return
null immediately
2022-06-27 14:57:01 +00:00
João Valverde aaff0d21ae dfilter: Add layer support for references
This adds support for using the layers filter
with field references.

Before:
    $ dftest 'ip.src != ${ip.src#2}'
    dftest: invalid character in macro name

After:
    $ dftest 'ip.src != ${ip.src#2}'
    Filter: ip.src != ${ip.src#2}

    Syntax tree:
     0 TEST_ALL_NE:
       1 FIELD(ip.src <FT_IPv4>)
       1 REFERENCE(ip.src#[2:1] <FT_IPv4>)

    Instructions:
    00000 READ_TREE		ip.src <FT_IPv4> -> reg#0
    00001 IF_FALSE_GOTO	5
    00002 READ_REFERENCE_R	${ip.src <FT_IPv4>} #[2:1] -> reg#1
    00003 IF_FALSE_GOTO	5
    00004 ALL_NE		reg#0 != reg#1
    00005 RETURN

This requires adding another level of complexity to references.
When loading references we need to copy the 'proto_layer_num'
and add the logic to filter on that.

The "layer" sttype is removed and replace by a new
field sttype with support for a range. This is a nice
cleanup for the semantic check and general simplification.
The grammar is better too with this design.

Range sttype is renamed to slice for clarity.
2022-06-25 14:57:40 +01:00
João Valverde 8793650707 dftest: Print ftype of protocol fields 2022-06-24 21:10:45 +00:00
Alexis La Goutte 8ee1eabeee dfvm(dfilter): fix clang analyzer warning (Dead.Store) 2022-05-22 08:40:44 +00:00
João Valverde bebf7afa37 dfilter: Remove unused DFVM 4th instruction argument 2022-05-13 14:13:18 +01:00
João Valverde 3bb918428e dfilter: Remove stale comment 2022-05-13 12:50:33 +00:00
João Valverde ac901e5de8 dfilter: Fix maybe-unitialized warning
[1702/2528] Building C object epan/dfilter/CMakeFiles/dfilter.dir/dfvm.c.o
In function ‘drange_contains_layer’,
    inlined from ‘filter_finfo_fvalues’ at /home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:587:21:
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:555:41: warning: ‘upper’ may be used uninitialized [-Wmaybe-uninitialized]
  555 |                 if (num >= lower && num <= upper) {  /* inclusive */
      |                                     ~~~~^~~~~~~~
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c: In function ‘filter_finfo_fvalues’:
/home/jpv/code/wireshark/wireshark/epan/dfilter/dfvm.c:537:20: note: ‘upper’ was declared here
  537 |         int lower, upper;
      |                    ^~~~~
2022-05-13 13:22:29 +01:00
João Valverde b602911b31 dfilter: Add support for universal quantifiers
Adds the keywords "any" and "all" to implement the quantification
to any existing relational operator.

Filter: all tcp.port in {100, 2000..3000}

Syntax tree:
 0 ALL TEST_IN:
   1 FIELD(tcp.port)
   1 SET(#2):
     2 FVALUE(100 <FT_UINT16>)
     2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>)

Instructions:
00000 READ_TREE		tcp.port -> reg#0
00001 IF_FALSE_GOTO	5
00002 ALL_EQ		reg#0 === 100 <FT_UINT16>
00003 IF_TRUE_GOTO	5
00004 ALL_IN_RANGE	reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> }
00005 RETURN
2022-05-12 14:26:54 +01:00
Joakim Karlsson b75b8ca72e dfilter: fix may be used uninitialized in this function [-Wmaybe-uninitialized] 2022-04-27 13:36:43 +02:00
João Valverde 4f3f507eee dfilter: Add syntax to match specific layers in the protocol stack
Add support to display filters for matching a specific layer within a frame.
Layers are counted sequentially up the protocol stack. Each protocol
(dissector) that appears in the stack is one layer.

LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc.

The syntax allows for negative indexes and ranges with the usual semantics
for slices (but note that counting starts at one):

    tcp.port#[2-4] == 1024

Matches layers 2 to 4 inclusive.

Fixes #3791.
2022-04-26 16:50:59 +00:00
João Valverde c0170dad42 dfilter: Rename "range" to "slice"
The word range is used for different things with different
meanings and that is confusing. Avoid using "range" in code to
mean "slice".

A range is one or more intervals with a lower and upper bound.

A slice is a range applied to a bytes field.

Replace range with slice wherever appropriate. This usage of
"slice" instead of range is generally correct and consistent in
the documentation.
2022-04-26 16:50:59 +00:00
Gerald Combs a73fd872ad dfilter: Add a null check.
Try to fix

*** CID 1504179:  Null pointer dereferences  (FORWARD_NULL)
/builds/wireshark/wireshark/epan/dfilter/dfvm.c: 327 in dfvm_dump_str()
321     				stack_print = dump_str_stack_push(stack_print, arg1_str);
322     				break;
323
324     			case STACK_POP:
325     				wmem_strbuf_append_printf(buf, "%05d STACK_POP\t%s\n", id, arg1_str);
326     				for (i = 0; i < arg1->value.numeric; i ++) {
>>>     CID 1504179:  Null pointer dereferences  (FORWARD_NULL)
>>>     Passing null pointer "stack_print" to "dump_str_stack_pop", which dereferences it.
327     					stack_print = dump_str_stack_pop(stack_print);
328     				}
329     				break;
330
331     			case MK_RANGE:
332     				wmem_strbuf_append_printf(buf, "%05d MK_RANGE\t\t%s[%s] -> %s\n",
2022-04-21 17:10:44 +00:00
João Valverde fab32ea0cb dfilter: Allow arithmetic expressions as function arguments
This allows writing moderately complex expressions, for example
a float epsilon test (#16483):

Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01

Syntax tree:
 0 TEST_LT:
   1 OP_DIVIDE:
     2 FUNCTION(abs#1):
       3 OP_SUBTRACT:
         4 FIELD(_ws.ftypes.double)
         4 FVALUE(1 <FT_DOUBLE>)
     2 FUNCTION(max#2):
       3 FUNCTION(abs#1):
         4 FIELD(_ws.ftypes.double)
       3 FUNCTION(abs#1):
         4 FVALUE(1 <FT_DOUBLE>)
   1 FVALUE(0.01 <FT_DOUBLE>)

Instructions:
00000 READ_TREE		_ws.ftypes.double -> reg#1
00001 IF_FALSE_GOTO	3
00002 SUBRACT		reg#1 - 1 <FT_DOUBLE> -> reg#2
00003 STACK_PUSH	reg#2
00004 CALL_FUNCTION	abs(reg#2) -> reg#0
00005 STACK_POP	1
00006 IF_FALSE_GOTO	24
00007 READ_TREE		_ws.ftypes.double -> reg#1
00008 IF_FALSE_GOTO	9
00009 STACK_PUSH	reg#1
00010 CALL_FUNCTION	abs(reg#1) -> reg#4
00011 STACK_POP	1
00012 IF_FALSE_GOTO	13
00013 STACK_PUSH	reg#4
00014 STACK_PUSH	1 <FT_DOUBLE>
00015 CALL_FUNCTION	abs(1 <FT_DOUBLE>) -> reg#5
00016 STACK_POP	1
00017 IF_FALSE_GOTO	18
00018 STACK_PUSH	reg#5
00019 CALL_FUNCTION	max(reg#5, reg#4) -> reg#3
00020 STACK_POP	2
00021 IF_FALSE_GOTO	24
00022 DIVIDE		reg#0 / reg#3 -> reg#6
00023 ANY_LT		reg#6 < 0.01 <FT_DOUBLE>
00024 RETURN

We now use a stack to pass arguments to the function. The
stack is implemented as a list of lists (list of registers).
Arguments may still be non-existent to functions (this is
a feature). Functions must check for nil arguments (NULL lists)
and handle that case.

It's somewhat complicated to allow literal values and test compatibility
for different types, both because of lack of type information with
unparsed/literal and also because it is an underdeveloped area in the
code. In my limited testing it was good enough and useful, further
enhancements are left for future work.
2022-04-18 17:10:31 +01:00
Alexis La Goutte 83959f77e3 dfvm: Fix Dead Store found by Clang Analyzer 2022-04-16 18:15:45 +00:00
João Valverde cb2f085f14 dfilter: Add max() and min() functions
Changes the function calling convention to pass the first register
number plus the number of registers after that sequentially. This
allows function with any number of arguments. Functions can still
only return one value.

Adds max() and min() function to select the maximum/minimum value
from any number of arguments, all of the same type. The functions
accept literals too. The return type is the same as the first argument
(cannot be a literal).
2022-04-14 13:07:41 +00:00
João Valverde c98df5eef5 dfilter: Print syntax tree using dftest + format enhancements
Add argument to dfilter_compile_real() to save syntax tree text
representation.

Use it with dftest to print syntax tree.

Misc debug output format improvements.
2022-04-05 12:04:37 +01:00
João Valverde d91734ab6a dfilter: Fix range registers in DFVM dump 2022-04-05 12:04:37 +01:00
João Valverde 8bc214b5bb dfilter: Add remaining arithmetic integer ops 2022-03-31 16:49:42 +01:00
João Valverde 2a9cb588aa dfilter: Add binary arithmetic (add/subtract)
Add support for display filter binary addition and subtraction.

The grammar is intentionally kept simple for now. The use case
is to add a constant to a protocol field, or (maybe) add two
fields in an expression.

We use signed arithmetic with unsigned numbers, checking for
overflow and casting where necessary to do the conversion.
We could legitimately opt to use traditional modular arithmetic
instead (like C) and if it turns out that that is more useful for
some reason we may want to in the future.

Fixes #15504.
2022-03-31 11:27:34 +01:00
João Valverde 260942e170 dfilter: Refactor macro tree references
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.

Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.

The reference is synctatically validated at compile time.

There are two main advantages to this implementation (and a couple of
minor ones):

The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).

The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).

Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).

If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.

Fixes #17599.
2022-03-29 12:36:31 +00:00
João Valverde d2907d91c0 dfilter: Add more logging for bytecode 2022-03-28 17:59:07 +01:00
João Valverde 9ee9b40b64 dfilter: Store expanded text 2022-03-28 17:22:01 +01:00
João Valverde ac0a69636b dfilter: Add support for unary arithmetic
This change implements a unary minus operator.

Filter: tcp.window_size_scalefactor == -tcp.dstport

Instructions:
00000 READ_TREE		tcp.window_size_scalefactor -> reg#0
00001 IF_FALSE_GOTO	6
00002 READ_TREE		tcp.dstport -> reg#1
00003 IF_FALSE_GOTO	6
00004 MK_MINUS		-reg#1 -> reg#2
00005 ANY_EQ		reg#0 == reg#2
00006 RETURN

It is supported for integer types, floats and relative time values.
The unsigned integer types are promoted to a 32 bit signed integer.

Unary plus is implemented as a no-op. The plus sign is simply ignored.

Constant arithmetic expressions are computed during compilation.

Overflow with constants is a compile time error. Overflow with
variables is a run time error and silently ignored. Only a debug
message will be printed to the console.

Related to #15504.
2022-03-28 11:20:41 +00:00
João Valverde 0335ebdc3a dfilter: ftype_is_true -> ftype_is_zero 2022-03-23 11:04:41 +00:00
João Valverde 16729be2c1 dfilter: Add bitwise masking of bits
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.

This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.

Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.

Fixes #17246.
2022-03-22 12:58:04 +00:00
João Valverde 631cf34f0c dfilter: Use a function pointer array to free registers 2022-03-21 18:43:36 +00:00
João Valverde d60f2580ba dfilter: Pass around constants in instructions
The DFVM instructions arguments are generic boxed types but instead
of using FVALUE and PCRE types the code passes aroung REGISTER types
instead. Change that to pass constants in the instruction.
2022-03-21 17:09:56 +00:00
João Valverde 94d909103e dfilter: Remove DFVM constant initialization 2022-03-21 17:09:43 +00:00
João Valverde ae17e733ac dfilter: Use more DFVM values in gencode 2022-03-21 17:09:29 +00:00
João Valverde 769f1f10de dfilter: Add DFVM value constructor 2022-03-21 17:09:19 +00:00
João Valverde 1b574e7466 dfilter: Cleanup dfvm_apply() 2022-03-21 12:38:09 +00:00
João Valverde 22f3d87a8f dfilter: Use singly linked list for registers
Replace calls to list append with list prepend where applicable.
2022-03-21 11:47:19 +00:00
João Valverde ea949ef719 dfilter: Cleanup dfilter_dump() 2022-03-21 11:26:52 +00:00
João Valverde 70301ba54c dfilter: Fix dfvm dump display
Fix operators to reflect their true meaning.
2022-02-27 19:12:02 +00:00
João Valverde 8b23dd3a3c dfilter: Add an "all equal" operator
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.

The symbol chosen for "all_eq" is "===".
2021-12-22 14:32:32 +00:00
João Valverde 5bba669579 Remove some lingering uses of g_assert()
Also replace some incorrect uses of g_assert_true().

  g_assert_true -> g_assert -> ws_assert
2021-12-16 10:19:45 +00:00
João Valverde bbaa144b3c dfilter: Remove reference to GRegex 2021-11-23 14:08:06 +00:00
João Valverde 274531820a Move regex code to wsutil 2021-11-14 21:00:59 +00:00
João Valverde 1a32a75a62 ftypes: Internal headers need to be internal
The header ftypes-int.h should not be used outside of epan/ftypes
because it is a private header.

The functions fvalue_free() and fvalue_cleanup() need not and should
not be macros either.
2021-11-11 03:15:31 +00:00
João Valverde 0abe10e040 dfilter: Fix "!=" relation to be free of contradictions
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.

Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.

Fix the '!=' relation to match the definition of equality:
  A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.

This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.

The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.

Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
  A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.

Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.

Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
2021-10-24 06:55:54 +00:00