Currently unused. This might still be useful to differentiate
different spelling of the same token in user messages, like
"==" and "eq", but currently we are not storing test tokens
anyway, so just remove it, it makes everything simpler.
If it's ever necessary it can be added back.
Using a hand written tokenizer is simpler than using flex start
conditions. Do the validation in the drange node constructor.
Add validation for malformed ranges with different endpoint signs.
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.
Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.
Fix the '!=' relation to match the definition of equality:
A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.
This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.
The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.
Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.
Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.
Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
Matches is a special case that looks on the RHS and tries
to convert every unparsed value to a string, regardless
of the LHS type. This is not how types work in the display
filter. Require double-quotes to avoid ambiguity, because
matches doesn't follow normal Wireshark display filter
type rules. It doesn't need nor benefit from the flexibility
provided by unparsed strings in the syntax.
For matches the RHS is always a literal strings except
if the RHS is also a field name, then it complains of an
incompatible type. This is confusing. No type can be compatible
because no type rules are ever considered. Every unparsed value is
a text string except if it happens to coincide with a field
name it also requires double-quoting or it throws a syntax error,
just to be difficult. We could remove this odd quirk but requiring
double-quotes for regular expressions is a better, more elegant
fix.
Before:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp.srcport
dftest: tcp and udp.srcport are not of compatible types.
Filter: tcp matches udp.srcportt
Constants:
00000 PUT_PCRE udp.srcportt -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
After:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
dftest: "udp" was unexpected in this context.
Filter: tcp matches udp.srcport
dftest: "udp.srcport" was unexpected in this context.
Filter: tcp matches udp.srcportt
dftest: "udp.srcportt" was unexpected in this context.
The error message could still be improved.
It is always an error to chain regexes using the logic for "le" and "eq".
var matches "regex1" matches "regex2"
=> var matches "regex1" and "regex1" matches "regex2"
Before:
Filter: tcp matches "abc$" matches "^cde"
dftest: Neither "abc$" nor "^cde" are field or protocol names.
Filter: "abc$" matches tcp matches "^cde"
dftest: Neither "abc$" nor "tcp" are field or protocol names.
After:
Filter: tcp matches "abc$" matches "^cde"
dftest: "matches" was unexpected in this context.
Filter: "abc$" matches tcp matches "^cde"
dftest: "matches" was unexpected in this context.
The lexical rules for fields and unparsed strings are ambiguous,
e.g. "fc" can be the protocol fibre channel or the byte 0xfc.
In general a name is determined to be a protocol field or not by
checking the registry.
Resolving the name in the parser gives more flexibility, for example
to use different semantic rules according to the relation between
LHS and RHS, and allows function names and protocol names to co-exist
without ambiguity.
Before:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
Filter: tcp() == 1
dftest: Syntax error near "(".
After:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
(same)
Filter: tcp() == 1
dftest: Function 'tcp' does not exist
It's also a goal to make it easier to modify the lexer rules.
Ping #12810.
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.
Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
A function is grammatically an identifier that is followed by '(' and ')'
according to some rules. We should avoid assuming a token is a function
just because it matches a registered function name.
Before:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: Syntax error near "(".
After:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: The function 'foobar' does not exist.
This has the problem that a function cannot have the same name
as a protocol but that limitation already existed before.
Clean up syntax error code. TEST and SET are never returned by
the tokenizer.
Remove unnecessary range_body() grammar element. Fix a comment.
Move the stnode_token_value() function to its proper place.
Allow an entity in the grammar as range body. Perform a stronger
sanity check during semantic analysis everywhere a range is used.
This is both safer (unless we want to allow FIELD bodies only, but
functions are allowed too) and also provides better error messages.
Previously a range of range only compiled on the RHS. Now it can
appear on both sides of a relation.
This fixes a crash with STRING entities similar to #10690 for
UNPARSED.
This also adds back support for slicing functions that was removed
in f3f833ccec (by accident presumably).
Ping #10690
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.
Before:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity <0x7f6c84017740> of type STRING
After:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity 01:02:03:04 of type STRING
When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
It's not a valid field type, it's only a hack to support regular
expression matching in packet-matching expressions.
Instead, in the packet-matching code, have a separate syntax tree type
for Perl-compatible regular expressions, and a separate instruction to
load one into a register, and have the "matching" operator for field
types take a GRegex * as the second argument.
Put the function name in quotes.
Change-Id: I09be392a9bac3b56c13b82a554d17ea29695657c
Reviewed-on: https://code.wireshark.org/review/31790
Reviewed-by: Guy Harris <guy@alum.mit.edu>
I guess "s" in "The function s" was supposed to be "%s", giving the
function name. Make it so, and properly fetch the function name.
Change-Id: I67287f24626fa0a2816fb2cf574e5d9ff58713bf
Reviewed-on: https://code.wireshark.org/review/31787
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Previously a filter such as `http.request.method in {"GET"HEAD""}` would
be parsed as three strings (GET, HEAD and an empty string). As it seems
more likely that people make typos rather than intending to construct
such a filter, forbid this by always requiring a whitespace separator.
Change-Id: I77e531fd6be072f62dd06aac27f856106c8920c6
Reported-by: Stig Bjørlykke
Reviewed-on: https://code.wireshark.org/review/26989
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Allow "tcp.srcport in {1662 1663 1664}" to be abbreviated to
"tcp.srcport in {1662 .. 1664}". The range operator is supported for any
field value which supports the "<=" and "=>" operators and thus works
for integers, IP addresses, etc.
The naive mapping "tcp.srcport >= 1662 and tcp.srcport <= 1664" is not
used because it does not have the intended effect with fields that have
multiple occurrences (e.g. tcp.port). Each condition could be satisfied
by an other value. Therefore a new DVFM instruction (ANY_IN_RANGE) is
added to test the range condition against each individual field value.
Bug: 14180
Change-Id: I53c2d0f9bc9d4f0ffaabde9a83442122965c95f7
Reviewed-on: https://code.wireshark.org/review/26945
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Add an FT_CHAR type, which is like FT_UINT8 except that the value is
displayed as a C-style character constant.
Allow use of C-style character constants in filter expressions; they can
be used in comparisons with all integral types, and in "contains"
operators.
Use that type for some fields that appear (based on the way they're
displayed, or on the use of C-style character constants in their
value_string tables) to be 1-byte characters rather than 8-bit numbers.
Change-Id: I39a9f0dda0bd7f4fa02a9ca8373216206f4d7135
Reviewed-on: https://code.wireshark.org/review/17787
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Because of a change in lemon the %parse_failure is not always called.
Bug: 11637
Change-Id: Iea218aeee10e20f29461169829a10345bbdac903
Reviewed-on: https://code.wireshark.org/review/11302
Petri-Dish: Stig Bjørlykke <stig@bjorlykke.org>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Tested-by: Jeff Morriss <jeff.morriss.ws@gmail.com>
Reviewed-by: Stig Bjørlykke <stig@bjorlykke.org>
Added a new relational test: 'x in {a b c}'. The only LHS entity
supported at this time is a field. The generated DFVM operations are
equivalent to an OR'ed series of =='s, but with the redundant existence
tests removed.
Change-Id: Iddc89b81cf7ad6319aef1a2a94f93314cb721a8a
Reviewed-on: https://code.wireshark.org/review/10246
Reviewed-by: Hadriel Kaplan <hadrielk@yahoo.com>
Petri-Dish: Hadriel Kaplan <hadrielk@yahoo.com>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
Have dfilter_compile() take an additional gchar ** argument, pointing to
a gchar * item that, on error, gets set to point to a g_malloc()ed error
string. That removes one bit of global state from the display filter
parser, and doesn't impose a fixed limit on the error message strings.
Have fvalue_from_string() and fvalue_from_unparsed() take a gchar **
argument, pointer to a gchar * item, rather than an error-reporting
function, and set the gchar * item to point to a g_malloc()ed error
string on an error.
Allow either gchar ** argument to be null; if the argument is null, no
error message is allocated or provided.
Change-Id: Ibd36b8aaa9bf4234aa6efa1e7fb95f7037493b4c
Reviewed-on: https://code.wireshark.org/review/6608
Reviewed-by: Guy Harris <guy@alum.mit.edu>
a string, a field name or another range - not an unparsed element
Bug: 10690
Change-Id: I126143636c940cc73ed6467660f0a573209e2ae9
Reviewed-on: https://code.wireshark.org/review/5243
Reviewed-by: Martin Kaiser <wireshark@kaiser.cx>
Tested-by: Martin Kaiser <wireshark@kaiser.cx>
The WRETH dissector showed up some garbage in the column display. Upon
further inspection, it turns out that the format string had a trailing
percent sign which caused (unsigned)-1 to be returned by
g_printf_string_upper_bound (in emem_strdup_vprintf). Then ep_alloc is
called with (unsigned)-1 + 1 = 0 memory, no wonder that garbage shows
up. ASAN could not even catch this error because EP is in charge of
this.
So, start adding G_GNUC_PRINTF annotations in each header that uses
the "fmt" or "format" paramters (grepped + awk). This revealed some
other errors. The NCP2222 dissector was missing a format string (not
a security vuln though).
Many dissectors used val_to_str with a constant (but empty) string,
these have been replaced by val_to_str_const. ASN.1 dissectors
were regenerated for this.
Minor: the mate plugin used "%X" instead of "%p" for a pointer type.
The ncp2222 dissector and wimax plugin gained modelines.
Change-Id: I7f3f6a3136116f9b251719830a39a7b21646f622
Reviewed-on: https://code.wireshark.org/review/2881
Reviewed-by: Evan Huus <eapache@gmail.com>
Like:
a == b == c
or
a < b <= c <= d < e
Real life example:
6660 <= tcp.port <= 6669
Just syntactic sugar, this is *NOT* optimized.
svn path=/trunk/; revision=43353
Add a table of DPCs and SSNs that allow to override the protocol that would be choosen
so that the same SSN can use two different protocols in two different DPCs.
I did not believe it someone could have done it, then I saw the captures...
svn path=/trunk/; revision=21321
they have LF at the end of the line on UN*X and CR/LF on Windows;
hopefully this means that if a CR/LF version is checked in on Windows,
the CRs will be stripped so that they show up only when checked out on
Windows, not on UN*X.
svn path=/trunk/; revision=11400
Use gint32 instead of guint32 for node data.
Fix up some other signed-vs-unsigned issues in the display filter
parser and lexical analyzer.
svn path=/trunk/; revision=11085
check, in the semantics-checking phase, that we're testing a field, so
that we can give a better message than, for example, "Unexpected end of
filter string." for an existence test with a misspelled field name.
svn path=/trunk/; revision=10043
New "matches" operater in display filter language. Uses PCRE.
If a "matches" operator is found in a dfilter
while libpcre has not been used to build the binary, then an
exception is thrown after using dfilter_fail() to set an apporporiate
error message.
svn path=/trunk/; revision=9182