Matches is a special case that looks on the RHS and tries
to convert every unparsed value to a string, regardless
of the LHS type. This is not how types work in the display
filter. Require double-quotes to avoid ambiguity, because
matches doesn't follow normal Wireshark display filter
type rules. It doesn't need nor benefit from the flexibility
provided by unparsed strings in the syntax.
For matches the RHS is always a literal strings except
if the RHS is also a field name, then it complains of an
incompatible type. This is confusing. No type can be compatible
because no type rules are ever considered. Every unparsed value is
a text string except if it happens to coincide with a field
name it also requires double-quoting or it throws a syntax error,
just to be difficult. We could remove this odd quirk but requiring
double-quotes for regular expressions is a better, more elegant
fix.
Before:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp.srcport
dftest: tcp and udp.srcport are not of compatible types.
Filter: tcp matches udp.srcportt
Constants:
00000 PUT_PCRE udp.srcportt -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
After:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
dftest: "udp" was unexpected in this context.
Filter: tcp matches udp.srcport
dftest: "udp.srcport" was unexpected in this context.
Filter: tcp matches udp.srcportt
dftest: "udp.srcportt" was unexpected in this context.
The error message could still be improved.
Always use the internal API to access "deprecated" and initialize
the data structure on demand. This fixes a null pointer dereference
introduced previously.
Use reference counting to share the array cleanly and avoid memory
leaks.
Keep the pointer in dfwork_t.
The lexical rules for fields and unparsed strings are ambiguous,
e.g. "fc" can be the protocol fibre channel or the byte 0xfc.
In general a name is determined to be a protocol field or not by
checking the registry.
Resolving the name in the parser gives more flexibility, for example
to use different semantic rules according to the relation between
LHS and RHS, and allows function names and protocol names to co-exist
without ambiguity.
Before:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
Filter: tcp() == 1
dftest: Syntax error near "(".
After:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
(same)
Filter: tcp() == 1
dftest: Function 'tcp' does not exist
It's also a goal to make it easier to modify the lexer rules.
Ping #12810.
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.
Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
A function is grammatically an identifier that is followed by '(' and ')'
according to some rules. We should avoid assuming a token is a function
just because it matches a registered function name.
Before:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: Syntax error near "(".
After:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: The function 'foobar' does not exist.
This has the problem that a function cannot have the same name
as a protocol but that limitation already existed before.
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.
Before:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity <0x7f6c84017740> of type STRING
After:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity 01:02:03:04 of type STRING
When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
Pass the deprecated data struture to the scanner and insert the deprecated
tokens there. This avoids having to keep a dedicated syntax node field
for this.
Pass the deprecated argument in dfwork_t instead of in a separate
argument. This is less cumbersome than adding an extra argument
to every level of the semantic checker.
Use wslog to output debug information. Being able to control
it at runtime is a big advantage.
We extend the syntax tree nodes with a method to return a
canonical string representation.
Add a routine to walk the tree and return an textual representation
for debugging purposes.
Add support for a literal string specification copied from Python
raw strings[1].
Raw string literals are enclosed with r"..." or R"...". Double quotes
can be include in the string but they must be escaped with backslash.
In escape sequences backslashes are preserved in the final result.
So for example the string "a\\\"b" is the same as r"a\"b".
r"\\\a" is the same as "\\\\\\a".
Raw strings should be used for convenience wherever a regular expression
is used in a display filter expression.
[1]https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Replace most instances of ws_debug_printf() except in
epan/dissectors and dissector plugins.
Some replacements use printf(), some use ws_debug(), and
some were removed because they were dead or judged to be
temporary.
Change all wireshark.org URLs to use https.
Fix some broken links while we're at it.
Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c
Reviewed-on: https://code.wireshark.org/review/34089
Reviewed-by: Guy Harris <guy@alum.mit.edu>
All other error-return code paths set *dfp to NULL; make this one do so
as well.
Change-Id: I4015c1d53bdbac99cdeda158d7d01c8da7bf2562
Reviewed-on: https://code.wireshark.org/review/31102
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Running tools/dfilter-test.py with LSan enabled resulted in 38 test
failures due to memory leaks from "fvalue_new". Problematic dfilters:
- Return values from functions, e.g. `len(data.data) > 8` (instruction
CALL_FUNCTION invoking functions from epan/dfilter/dfunctions.c)
- Slice operator: `data.data[1:2] == aa:bb` (function mk_range)
These values end up in "registers", but as some values (from READ_TREE)
reference the proto tree, a new tracking flag ("owns_memory") is added.
Add missing tests for some functions and try to improve documentation.
Change-Id: I28e8cf872675d0a81ea7aa5fac7398257de3f47b
Reviewed-on: https://code.wireshark.org/review/27132
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Previously a filter such as `http.request.method in {"GET"HEAD""}` would
be parsed as three strings (GET, HEAD and an empty string). As it seems
more likely that people make typos rather than intending to construct
such a filter, forbid this by always requiring a whitespace separator.
Change-Id: I77e531fd6be072f62dd06aac27f856106c8920c6
Reported-by: Stig Bjørlykke
Reviewed-on: https://code.wireshark.org/review/26989
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
XXX_prime_with_YYY makes it a bit clearer than does XXX_prime_YYY that
we're not priming YYY, we're priming XXX *using* YYY.
Change-Id: I1686b8b5469bc0f0bd6db8551fb6301776a1b133
Reviewed-on: https://code.wireshark.org/review/21031
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Tried to poke various fields (including the capture filter field), this
revealed some memleaks.
Change-Id: I1eca431a09839906a4b3c902ad85e55bffc71ca8
Reviewed-on: https://code.wireshark.org/review/17648
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
Many of the complaints from checkAPI.pl for use of printf are when its embedded
in an #ifdef and checkAPI isn't smart enough to figure that out.
The other (non-ifdef) use is dumping internal structures (which is a type of
debug functionality)
Add a "ws_debug_printf" macro for printf to pacify the warnings.
Change-Id: I63610e1adbbaf2feffb4ec9d4f817247d833f7fd
Reviewed-on: https://code.wireshark.org/review/16623
Reviewed-by: Michael Mann <mmann78@netscape.net>
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
master-branch libpcap now generates a reentrant Flex scanner and
Bison/Berkeley YACC parser for capture filter expressions, so it
requires versions of Flex and Bison/Berkeley YACC that support that.
We might as well do the same. For libwiretap, it means we could
actually have multiple K12 text or Ascend/Lucent text files open at the
same time. For libwireshark, it might not be as useful, as we only read
configuration files at startup (which should only happen once, in one
thread) or on demand (in which case, if we ever support multiple threads
running libwireshark, we'd need a mutex to ensure that only one file
reads it), but it's still the right thing to do.
We also require a version of Flex that can write out a header file, so
we change the runlex script to generate the header file ourselves. This
means we require a version of Flex new enough to support --header-file.
Clean up some other stuff encountered in the process.
Change-Id: Id23078c6acea549a52fc687779bb55d715b55c16
Reviewed-on: https://code.wireshark.org/review/14719
Reviewed-by: Guy Harris <guy@alum.mit.edu>
The proto tree is needed in several cases when using Lua field extractors,
because they fetch values from the tree. Without a valid field extractor
a Lua plugin may misbehave and display wrong column info.
This fixes column issues when:
- Calling resetColumns() in Qt. This involves adding a display filter,
change time display format, change name resolution and other changes
in UI which requires column updates.
- Print summary lines.
- Export as CSV and PSML.
Change-Id: Ieed6f8578cdf2759f1f836cd8413a4529b7bbd80
Reviewed-on: https://code.wireshark.org/review/13708
Petri-Dish: Stig Bjørlykke <stig@bjorlykke.org>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
dfilter_macro_apply_recurse() returns either NULL or a pointer to
freshly-allocated memory, so it doesn't return a const pointer.
dfilter_macro_apply() calls dfilter_macro_apply_recurse(), so it doesn't
return a const pointer, either.
In dfilter_compile(), have separate variables for the filter handed in
and the macro-expanded filter, the former being const gchar * and the
latter being gchar *.
Change-Id: I191549bf0ff6c09c1278a98432a907c93d5e0e74
Reviewed-on: https://code.wireshark.org/review/12446
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Have dfilter_compile() take an additional gchar ** argument, pointing to
a gchar * item that, on error, gets set to point to a g_malloc()ed error
string. That removes one bit of global state from the display filter
parser, and doesn't impose a fixed limit on the error message strings.
Have fvalue_from_string() and fvalue_from_unparsed() take a gchar **
argument, pointer to a gchar * item, rather than an error-reporting
function, and set the gchar * item to point to a g_malloc()ed error
string on an error.
Allow either gchar ** argument to be null; if the argument is null, no
error message is allocated or provided.
Change-Id: Ibd36b8aaa9bf4234aa6efa1e7fb95f7037493b4c
Reviewed-on: https://code.wireshark.org/review/6608
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Couldn't quite eliminate it completely, but it's much improved. Need to figure out where/when to free dfilter_error_msg.
Change-Id: I10216e9546d38e83f69991ded8ec0b3fc8472035
Reviewed-on: https://code.wireshark.org/review/6591
Reviewed-by: Michael Mann <mmann78@netscape.net>
(Using sed : sed -i '/^ \* \$Id\$/,+1 d')
Fix manually some typo (in export_object_dicom.c and crc16-plain.c)
Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8
Reviewed-on: https://code.wireshark.org/review/497
Reviewed-by: Anders Broman <a.broman58@gmail.com>
routine that does all the work and that takes a depth argumen, and an
external routine that calls that internal routine with a depth argument
of 0. The depth is only of use internally, to avoid infinite recursion.
When recursing with that routine, pass depth+1 as the depth value,
rather than passing depth and incrementing it afterwards; the latter
doesn't prevent infinite recursion. (Thanks and a tip of the hat to
Clang Cat for catching this.)
Squelch some other (harmless) warnings from Clang Cat.
Clean up indentation.
svn path=/trunk/; revision=39838
* Fix memleak (df->deprecated in dfilter_free())
* Free protocol hash tables on cleanup.
* Free protocols list on cleanup.
* Free memory allocated by fgetline() in parse_services_file()
From me:
* proto.c: set gmc_hfinfo to NULL after free
* proto.c: switch order of g_free() and g_list_remove() in proto_cleanup()
svn path=/trunk/; revision=29656
"Purify reports an uninitialized memory read in dfw_append_const() when
accessing the 'next_const_id' member. This seems to be caused by dfwork_new()
which doesn't properly initialize the member."
svn path=/trunk/; revision=28486