No one is using this so I'd like to explore other
options first to handle constants in arithmetic
expressions that lack type information.
Reverts 3ddb017a88.
Make dfilter byte representation always use ':' for consistency.
Make 1 byte be represented as "XX:" with the colon suffix to
make it nonambiguous that is is a byte and not other type,
like a protocol.
The difference is can be seen in the following programs. In the
before representation it is not obvious at all that the second
"fc" value is a literal bytes value and not the value of the
protocol "fc", although it can be inferred from the lack of
a READ_TREE instruction. In the After we know that "fc:" must
be bytes and not a protocol.
Note that a leading colon is a syntactical expedient to say
"this value with any type is a literal value and not a protocol
field." A terminating colon is just a part of the dfilter
literal bytes syntax.
Before:
Filter: fc == :fc
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(fc <FT_PROTOCOL>)
1 FVALUE(fc <FT_PROTOCOL>)
Instructions:
00000 READ_TREE fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == fc <FT_PROTOCOL>
After:
Filter: fc == :fc
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(fc <FT_PROTOCOL>)
1 FVALUE(fc: <FT_PROTOCOL>)
Instructions:
00000 READ_TREE fc <FT_PROTOCOL> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == fc: <FT_PROTOCOL>
Remove some unused historical files.
Aggressively disable warnings to keep the lemon source
pristine and avoid the maintenance burden for lemon itself.
Lemon has its own lax policy for warnings that doesn't match our
own and they won't accept external patches to remove the
warnings, so just ignore them. Lemon is just executed to generate
code for the Wireshark build and the minor code issues it has
have no influence at runtime.
For lemon generated code we selectively disable some linting
warnings.
Remove patches for lemon and lempar, they are no longer required
with these changes to silence warnings.
Constant logical expressions are tautologies and almost certainly
user error. Reject them as invalid.
Most of them were already rejected with insufficient type information
but some corner cases were still valid.
Before:
Filter: ${frame.number} == 3
Syntax tree:
0 TEST_ANY_EQ:
1 REFERENCE(frame.number <FT_UINT32>)
1 FVALUE(3 <FT_UINT32>)
Instructions:
00000 READ_REFERENCE ${frame.number <FT_UINT32>} -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 3 <FT_UINT32>
00003 RETURN
After:
Filter: ${frame.number} == 3
dftest: Constant expression is invalid.
${frame.number} == 3
^~~~~~~~~~~~~~~~~~~~
Remove unparsed lexical type and replace it with identifier
and constant. This separation is still necessary to differentiate
names (fields and function) from literals that look like names
but it has some advantages to do it at the lexical level.
The main advantage is a much cleaner and simplified grammar,
because we only have a single token type for field names, without
any loss of generality (the same name is valid for fields and
function names for example).
The CONSTANT token type is necessary to be different from literal
to provide errors for function rules.
Underline the whole expression if the error is for the function.
Before:
Filter: frame.number == abs(1, 2)
dftest: Function abs can only accept 1 arguments.
frame.number == abs(1, 2)
^~~
After:
Filter: frame.number == abs(1, 2)
dftest: Function abs can only accept 1 arguments.
frame.number == abs(1, 2)
^~~~~~~~~
The strategy here is to delay resolving literals to values until
we have looked at the entire argument list.
Also we will try to commute the relation in a comparison if
we do not have a type for the return value of the function,
like any other constant.
Before:
Filter: max(1,_ws.ftypes.int8) == 1
dftest: Argument '1' is not valid for max()
max(1,_ws.ftypes.int8) == 1
^
After:
Filter: max(1,_ws.ftypes.int8) == 1
Syntax tree:
0 TEST_ANY_EQ:
1 FUNCTION(max#2):
2 FVALUE(1 <FT_INT8>)
2 FIELD(_ws.ftypes.int8 <FT_INT8>)
1 FVALUE(1 <FT_INT8>)
Instructions:
00000 STACK_PUSH 1 <FT_INT8>
00001 READ_TREE _ws.ftypes.int8 <FT_INT8> -> reg#1
00002 IF_FALSE_GOTO 3
00003 STACK_PUSH reg#1
00004 CALL_FUNCTION max(reg#1, 1 <FT_INT8>) -> reg#0
00005 STACK_POP 2
00006 IF_FALSE_GOTO 8
00007 ANY_EQ reg#0 == 1 <FT_INT8>
00008 RETURN
Filter: max(1,_ws.ftypes.int8) == 1
** (dftest:64938) 01:43:25.950180 [DFilter ERROR] epan/dfilter/sttype-field.c:117 -- sttype_field_ftenum(): Magic num is 0x5cf30031, but should be 0xfc2002cf
Comparison relations should be allowed to commute but they can not
because we need type information to resolve literals to fvalues. For
that reason an expression like "1 == some.field" is invalid. Solve
that by commuting the relation if the first try did not succeed in
assigning a type to the LHS.
After the second try give up, that means we have a relation with
constants on both sides and that is not semantically valid.
Other relations like "matches" and "contains" are not symmetric and
should not commute anyway.
Before:
Filter: _ws.ftypes.int32 == 10
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.int32 <FT_INT32>)
1 FVALUE(10 <FT_INT32>)
Instructions:
00000 READ_TREE _ws.ftypes.int32 <FT_INT32> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 10 <FT_INT32>
00003 RETURN
Filter: 10 == _ws.ftypes.int32
dftest: Left side of "==" expression must be a field or function, not 10.
10 == _ws.ftypes.int32
^~
After:
Filter: _ws.ftypes.int32 == 10
Syntax tree:
0 TEST_ANY_EQ:
1 FIELD(_ws.ftypes.int32 <FT_INT32>)
1 FVALUE(10 <FT_INT32>)
Instructions:
00000 READ_TREE _ws.ftypes.int32 <FT_INT32> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ reg#0 == 10 <FT_INT32>
00003 RETURN
Filter: 10 == _ws.ftypes.int32
Syntax tree:
0 TEST_ANY_EQ:
1 FVALUE(10 <FT_INT32>)
1 FIELD(_ws.ftypes.int32 <FT_INT32>)
Instructions:
00000 READ_TREE _ws.ftypes.int32 <FT_INT32> -> reg#0
00001 IF_FALSE_GOTO 3
00002 ANY_EQ 10 <FT_INT32> == reg#0
00003 RETURN
Use a consistent style for grammar rules.
Remove a comment that is too generic. The current code should
conform to how Python operates and does not need additional error
checking.
This parameter was introduced as a safeguard for bugs
that generate an unbounded string but its utility for
that purpose is doubtful and the way it is being used
creates problems with invalid truncation of UTF-8
strings.
Rename wmem_strbuf_sized_new() with a better name.
Currently the autocompletion engine always suggests a protocol
field completion, even in places where it isn't syntactically
valid.
Fix that by compiling the preamble to the token under the cursor
and checking the returned error. If it is DF_ERROR_UNEXPECTED_END
that indicates a field or literal value was expected. Otherwise
a field replacement is not valid in this position.
Fixes#12811.
Return an struct containing error information. This simplifies
the interface to more easily provide richer diagnostics in the future.
Add an error code besides a human-readable error string to allow
checking programmatically for errors in a robust manner. Currently
there is only a generic error code, it is expected to increase
in the future.
Move error location information to the struct. Change callers and
implementation to use the new interface.
Rename flex macros using parenthesis (mostly a style issue):
DIAG_OFF_FLEX -> DIAG_OFF_FLEX()
DIAG_ON_FLEX -> DIAG_ON_FLEX()
Use the same kind of construct with lemon generated code using
DIAG_OFF_LEMON() and DIAG_ON_LEMON(). Use %include and %code
directives to enforce the desired order with generated code
in the middle in between pragmas.
Fix a clang-specific pragma to use DIAG_OFF_CLANG().
DIAG_OFF(unreachable-code) -> DIAG_OFF_CLANG(unreachable-code).
Apparently GCC is ignoring the -Wunreachable flag, that's why
it did not trigger an unknown pragma warning. From [1}:
The -Wunreachable-code has been removed, because it was unstable: it
relied on the optimizer, and so different versions of gcc would warn
about different code. The compiler still accepts and ignores the
command line option so that existing Makefiles are not broken. In some
future release the option will be removed entirely. - Ian
[1] https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html
Instead of using the abstract type "<RAW>", which might be confusing,
show FT_BYTES, but display the representation with the "@" operator,
so it's not even more confusing in error messages why a field might
flip-flop types.
Refactor the field tostr() function and some other clean ups.
Before:
```
Filter: _ws.ftypes.string ==${@frame.len}
dftest: _ws.ftypes.string and frame.len <RAW> are not of compatible types.
_ws.ftypes.string ==${@frame.len}
^~~~~~~~~
```
After:
```
Filter: _ws.ftypes.string ==${@frame.len}
dftest: _ws.ftypes.string <FT_STRING> and @frame.len <FT_BYTES> are not of compatible types.
_ws.ftypes.string ==${@frame.len}
^~~~~~~~~
```