Add argument to dfilter_compile_real() to save syntax tree text
representation.
Use it with dftest to print syntax tree.
Misc debug output format improvements.
By the time we are using the reference fvalue the tree may have gone
away and with it the fvalue. We need to duplicate the reference
fvalues and take ownership of the memory.
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.
Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.
The reference is synctatically validated at compile time.
There are two main advantages to this implementation (and a couple of
minor ones):
The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).
The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).
Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).
If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.
Fixes#17599.
Add support for masking of bits. Before the bitwise operator
could only test bits, it did not support clearing bits.
This allows testing if any combination of bits are set/unset
more naturally with a single test. Previously this was only
possible by combining several bitwise predicates.
Bitwise is implemented as a test node, even though it is not.
Maybe the test node should be renamed to something else.
Fixes#17246.
The syntax for protocols and some literals like numbers
and bytes/addresses can be ambiguous. Some protocols can
be parsed as a literal, for example the protocol "fc"
(Fibre Channel) can be parsed as 0xFC.
If a numeric protocol is registered that will also take
precedence over any literal, according to the current
rules, thereby breaking numerical comparisons to that
number. The same for an hypothetical protocol named "true",
etc.
To allow the user to disambiguate this meaning introduce
new syntax.
Any value prefixed with ':' or enclosed in <,> will be treated
as a literal value only. The value :fc or <fc> will always
mean 0xFC, under any context. Never a protocol whose filter
name is "fc".
Likewise any value prefixed with a dot will always be parsed
as an identifier (protocol or protocol field) in the language.
Never any literal value parsed from the token "fc".
This allows the user to be explicit about the meaning,
and between the two explicit methods plus the ambiguous one
it doesn't completely break any one meaning.
The difference can be seen in the following two programs:
Filter: frame == fc
Constants:
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 5
00002 READ_TREE fc -> reg#1
00003 IF-FALSE-GOTO 5
00004 ANY_EQ reg#0 == reg#1
00005 RETURN
--------
Filter: frame == :fc
Constants:
00000 PUT_FVALUE fc <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
The filter "frame == fc" is the same as "filter == .fc",
according to the current heuristic, except the first form
will try to parse it as a literal if the name does not
correspond to any registered protocol.
By treating a leading dot as a name in the language we
necessarily disallow writing floats with a leading dot. We
will also disallow writing with an ending dot when using
unparsed values. This is a backward incompatibility but has
the happy side effect of making the expression {1...2}
unambiguous.
This could either mean "1 .. .2" or "1. .. 2". If we require
a leading and ending digit then the meaning is clear:
1.0..0.2 -> 1.0 .. 0.2
Fixes#17731.
Instead of requiring a special error function in the parser
just set the syntax_error flag if an error occurs, in any stage
of compilation. Outside of the parser loop it will not be used
but that is fine.
Invalid character constants should be handled in the lexical scanner.
Todo: See if some code could be shared to parse double quoted strings.
It also fixes some unintuitive type coercions to string. Character
constants should be treated as characters, or maybe integers, or
maybe even throw an invalid comparison error, but coverting to a
literal string or byte array is surprising and not particularly
useful:
'\xFF' -> "'\xFF'" (equals)
'\xFF' -> "FF" (contains)
Before:
Filter: http.request.method contains "\x63"
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method contains '\x63'
Constants:
00000 PUT_FVALUE "63" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == "\x63"
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == '\x63'
Constants:
00000 PUT_FVALUE "'\\x63'" <FT_STRING> -> reg#1
(...)
After:
Filter: http.request.method contains '\x63'
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == '\x63'
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Using a hand written tokenizer is simpler than using flex start
conditions. Do the validation in the drange node constructor.
Add validation for malformed ranges with different endpoint signs.
Matches is a special case that looks on the RHS and tries
to convert every unparsed value to a string, regardless
of the LHS type. This is not how types work in the display
filter. Require double-quotes to avoid ambiguity, because
matches doesn't follow normal Wireshark display filter
type rules. It doesn't need nor benefit from the flexibility
provided by unparsed strings in the syntax.
For matches the RHS is always a literal strings except
if the RHS is also a field name, then it complains of an
incompatible type. This is confusing. No type can be compatible
because no type rules are ever considered. Every unparsed value is
a text string except if it happens to coincide with a field
name it also requires double-quoting or it throws a syntax error,
just to be difficult. We could remove this odd quirk but requiring
double-quotes for regular expressions is a better, more elegant
fix.
Before:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp.srcport
dftest: tcp and udp.srcport are not of compatible types.
Filter: tcp matches udp.srcportt
Constants:
00000 PUT_PCRE udp.srcportt -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
After:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
dftest: "udp" was unexpected in this context.
Filter: tcp matches udp.srcport
dftest: "udp.srcport" was unexpected in this context.
Filter: tcp matches udp.srcportt
dftest: "udp.srcportt" was unexpected in this context.
The error message could still be improved.
Always use the internal API to access "deprecated" and initialize
the data structure on demand. This fixes a null pointer dereference
introduced previously.
Use reference counting to share the array cleanly and avoid memory
leaks.
Keep the pointer in dfwork_t.
The lexical rules for fields and unparsed strings are ambiguous,
e.g. "fc" can be the protocol fibre channel or the byte 0xfc.
In general a name is determined to be a protocol field or not by
checking the registry.
Resolving the name in the parser gives more flexibility, for example
to use different semantic rules according to the relation between
LHS and RHS, and allows function names and protocol names to co-exist
without ambiguity.
Before:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
Filter: tcp() == 1
dftest: Syntax error near "(".
After:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
(same)
Filter: tcp() == 1
dftest: Function 'tcp' does not exist
It's also a goal to make it easier to modify the lexer rules.
Ping #12810.
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.
Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
A function is grammatically an identifier that is followed by '(' and ')'
according to some rules. We should avoid assuming a token is a function
just because it matches a registered function name.
Before:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: Syntax error near "(".
After:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: The function 'foobar' does not exist.
This has the problem that a function cannot have the same name
as a protocol but that limitation already existed before.
Pass the deprecated data struture to the scanner and insert the deprecated
tokens there. This avoids having to keep a dedicated syntax node field
for this.
Pass the deprecated argument in dfwork_t instead of in a separate
argument. This is less cumbersome than adding an extra argument
to every level of the semantic checker.
Use wslog to output debug information. Being able to control
it at runtime is a big advantage.
We extend the syntax tree nodes with a method to return a
canonical string representation.
Add a routine to walk the tree and return an textual representation
for debugging purposes.
Add support for a literal string specification copied from Python
raw strings[1].
Raw string literals are enclosed with r"..." or R"...". Double quotes
can be include in the string but they must be escaped with backslash.
In escape sequences backslashes are preserved in the final result.
So for example the string "a\\\"b" is the same as r"a\"b".
r"\\\a" is the same as "\\\\\\a".
Raw strings should be used for convenience wherever a regular expression
is used in a display filter expression.
[1]https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Running tools/dfilter-test.py with LSan enabled resulted in 38 test
failures due to memory leaks from "fvalue_new". Problematic dfilters:
- Return values from functions, e.g. `len(data.data) > 8` (instruction
CALL_FUNCTION invoking functions from epan/dfilter/dfunctions.c)
- Slice operator: `data.data[1:2] == aa:bb` (function mk_range)
These values end up in "registers", but as some values (from READ_TREE)
reference the proto tree, a new tracking flag ("owns_memory") is added.
Add missing tests for some functions and try to improve documentation.
Change-Id: I28e8cf872675d0a81ea7aa5fac7398257de3f47b
Reviewed-on: https://code.wireshark.org/review/27132
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Previously a filter such as `http.request.method in {"GET"HEAD""}` would
be parsed as three strings (GET, HEAD and an empty string). As it seems
more likely that people make typos rather than intending to construct
such a filter, forbid this by always requiring a whitespace separator.
Change-Id: I77e531fd6be072f62dd06aac27f856106c8920c6
Reported-by: Stig Bjørlykke
Reviewed-on: https://code.wireshark.org/review/26989
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
master-branch libpcap now generates a reentrant Flex scanner and
Bison/Berkeley YACC parser for capture filter expressions, so it
requires versions of Flex and Bison/Berkeley YACC that support that.
We might as well do the same. For libwiretap, it means we could
actually have multiple K12 text or Ascend/Lucent text files open at the
same time. For libwireshark, it might not be as useful, as we only read
configuration files at startup (which should only happen once, in one
thread) or on demand (in which case, if we ever support multiple threads
running libwireshark, we'd need a mutex to ensure that only one file
reads it), but it's still the right thing to do.
We also require a version of Flex that can write out a header file, so
we change the runlex script to generate the header file ourselves. This
means we require a version of Flex new enough to support --header-file.
Clean up some other stuff encountered in the process.
Change-Id: Id23078c6acea549a52fc687779bb55d715b55c16
Reviewed-on: https://code.wireshark.org/review/14719
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Have dfilter_compile() take an additional gchar ** argument, pointing to
a gchar * item that, on error, gets set to point to a g_malloc()ed error
string. That removes one bit of global state from the display filter
parser, and doesn't impose a fixed limit on the error message strings.
Have fvalue_from_string() and fvalue_from_unparsed() take a gchar **
argument, pointer to a gchar * item, rather than an error-reporting
function, and set the gchar * item to point to a g_malloc()ed error
string on an error.
Allow either gchar ** argument to be null; if the argument is null, no
error message is allocated or provided.
Change-Id: Ibd36b8aaa9bf4234aa6efa1e7fb95f7037493b4c
Reviewed-on: https://code.wireshark.org/review/6608
Reviewed-by: Guy Harris <guy@alum.mit.edu>
The WRETH dissector showed up some garbage in the column display. Upon
further inspection, it turns out that the format string had a trailing
percent sign which caused (unsigned)-1 to be returned by
g_printf_string_upper_bound (in emem_strdup_vprintf). Then ep_alloc is
called with (unsigned)-1 + 1 = 0 memory, no wonder that garbage shows
up. ASAN could not even catch this error because EP is in charge of
this.
So, start adding G_GNUC_PRINTF annotations in each header that uses
the "fmt" or "format" paramters (grepped + awk). This revealed some
other errors. The NCP2222 dissector was missing a format string (not
a security vuln though).
Many dissectors used val_to_str with a constant (but empty) string,
these have been replaced by val_to_str_const. ASN.1 dissectors
were regenerated for this.
Minor: the mate plugin used "%X" instead of "%p" for a pointer type.
The ncp2222 dissector and wimax plugin gained modelines.
Change-Id: I7f3f6a3136116f9b251719830a39a7b21646f622
Reviewed-on: https://code.wireshark.org/review/2881
Reviewed-by: Evan Huus <eapache@gmail.com>
(Using sed : sed -i '/^ \* \$Id\$/,+1 d')
Fix manually some typo (in export_object_dicom.c and crc16-plain.c)
Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8
Reviewed-on: https://code.wireshark.org/review/497
Reviewed-by: Anders Broman <a.broman58@gmail.com>
- Change ugly GLIB version checking statements to GLIB_CHECK_VERSION
- Remove ws_strsplit files because we no longer need to borrow GLIB2's
g_strsplit code for the no longer supported GLIB1 builds
svn path=/trunk/; revision=24829
they have LF at the end of the line on UN*X and CR/LF on Windows;
hopefully this means that if a CR/LF version is checked in on Windows,
the CRs will be stripped so that they show up only when checked out on
Windows, not on UN*X.
svn path=/trunk/; revision=11400
Add a #define to enable parser tracing.
Clean up parser state when finished parsing, even if we stopped
parsing due to a syntax error, so that there's nothing left
around to screw up the next parse.
svn path=/trunk/; revision=11152
analyzer on errors, and check for SCAN_FAILED from the lexical analyzer
and abort the parse if we see it; 0 means "end of input", and we want to
distinguish errors from end-of-input, so that we can report errors as
such.
If we see end-of-input while parsing a double-quoted string, report the
error (missing closing quote).
Fix the URL for the "Start conditions" section of the Flex manual.
svn path=/trunk/; revision=10044
move the code from "dfilter_lookup_token()" into
"proto_registrar_get_byname()", and get rid of "dfilter_lookup_token()"
and have its callers call "proto_registrar_get_byname()" instead.
svn path=/trunk/; revision=5287