Commit Graph

55 Commits

Author SHA1 Message Date
João Valverde 74a89a9862 dfilter: Minor set grammar cleanup 2021-10-27 11:13:52 +01:00
João Valverde a7c625808c dfilter: Add a helper function to create test stnodes 2021-10-27 09:27:45 +01:00
João Valverde f5fea52982 dfilter: Remove token value from syntax tree
Currently unused. This might still be useful to differentiate
different spelling of the same token in user messages, like
"==" and "eq", but currently we are not storing test tokens
anyway, so just remove it, it makes everything simpler.

If it's ever necessary it can be added back.
2021-10-27 09:27:45 +01:00
João Valverde 0e4851b025 dfilter: Use a string lval type in scanner
Minor change to decouple the AST data structures from the lexical
scanner. We pass a structure to allow for some future enhancements.
2021-10-27 09:27:45 +01:00
João Valverde b1222edcd2 dfilter: Parse ranges in the drange node constructor
Using a hand written tokenizer is simpler than using flex start
conditions. Do the validation in the drange node constructor.

Add validation for malformed ranges with different endpoint signs.
2021-10-27 06:02:07 +00:00
João Valverde 0abe10e040 dfilter: Fix "!=" relation to be free of contradictions
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.

Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.

Fix the '!=' relation to match the definition of equality:
  A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.

This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.

The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.

Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
  A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.

Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.

Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
2021-10-24 06:55:54 +00:00
João Valverde 2e048df011 dfilter: Improve error message for "matches"
Should be more obvious that this error is caused
by a string syntax error and not something else.
2021-10-18 12:09:36 +00:00
João Valverde a975d478ba dfilter: Require double-quoted strings with "matches"
Matches is a special case that looks on the RHS and tries
to convert every unparsed value to a string, regardless
of the LHS type. This is not how types work in the display
filter. Require double-quotes to avoid ambiguity, because
matches doesn't follow normal Wireshark display filter
type rules. It doesn't need nor benefit from the flexibility
provided by unparsed strings in the syntax.

For matches the RHS is always a literal strings except
if the RHS is also a field name, then it complains of an
incompatible type. This is confusing. No type can be compatible
because no type rules are ever considered. Every unparsed value is
a text string except if it happens to coincide with a field
name it also requires double-quoting or it throws a syntax error,
just to be difficult. We could remove this odd quirk but requiring
double-quotes for regular expressions is a better, more elegant
fix.

Before:
  Filter: tcp matches "udp"

  Constants:
  00000 PUT_PCRE	udp -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

  Filter: tcp matches udp

  Constants:
  00000 PUT_PCRE	udp -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

  Filter: tcp matches udp.srcport
  dftest: tcp and udp.srcport are not of compatible types.

  Filter: tcp matches udp.srcportt

  Constants:
  00000 PUT_PCRE	udp.srcportt -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

After:
  Filter: tcp matches "udp"

  Constants:
  00000 PUT_PCRE	udp -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_MATCHES	reg#0 matches reg#1
  00003 RETURN

  Filter: tcp matches udp
  dftest: "udp" was unexpected in this context.

  Filter: tcp matches udp.srcport
  dftest: "udp.srcport" was unexpected in this context.

  Filter: tcp matches udp.srcportt
  dftest: "udp.srcportt" was unexpected in this context.

The error message could still be improved.
2021-10-17 22:53:36 +00:00
João Valverde 4e5e806604 dfilter: Do not chain matches expressions
It is always an error to chain regexes using the logic for "le" and "eq".

  var matches "regex1" matches "regex2"
    => var matches "regex1" and "regex1" matches "regex2"

Before:
  Filter: tcp matches "abc$" matches "^cde"
  dftest: Neither "abc$" nor "^cde" are field or protocol names.

  Filter: "abc$" matches tcp matches "^cde"
  dftest: Neither "abc$" nor "tcp" are field or protocol names.

After:
  Filter: tcp matches "abc$" matches "^cde"
  dftest: "matches" was unexpected in this context.

  Filter: "abc$" matches tcp matches "^cde"
  dftest: "matches" was unexpected in this context.
2021-10-17 22:53:36 +00:00
João Valverde e91b5beafd dfilter: Resolve field names in the parser
The lexical rules for fields and unparsed strings are ambiguous,
e.g. "fc" can be the protocol fibre channel or the byte 0xfc.
In general a name is determined to be a protocol field or not by
checking the registry.

Resolving the name in the parser gives more flexibility, for example
to use different semantic rules according to the relation between
LHS and RHS, and allows function names and protocol names to co-exist
without ambiguity.

Before:
  Filter: tcp == 1

  Constants:
  00000 PUT_FVALUE	01 <FT_PROTOCOL> -> reg#1

  Instructions:
  00000 READ_TREE		tcp -> reg#0
  00001 IF-FALSE-GOTO	3
  00002 ANY_EQ		reg#0 == reg#1
  00003 RETURN

  Filter: tcp() == 1
  dftest: Syntax error near "(".

After:
  Filter: tcp == 1

  Constants:
  00000 PUT_FVALUE	01 <FT_PROTOCOL> -> reg#1

  Instructions:
  (same)

  Filter: tcp() == 1
  dftest: Function 'tcp' does not exist

It's also a goal to make it easier to modify the lexer rules.

Ping #12810.
2021-10-14 16:45:19 +01:00
João Valverde 2c701ddf6f dfilter: Improve grammar to parse ranges
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.

Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
2021-10-08 19:18:56 +01:00
João Valverde 92285e6258 dfilter: Improve grammar to parse functions
A function is grammatically an identifier that is followed by '(' and ')'
according to some rules. We should avoid assuming a token is a function
just because it matches a registered function name.

Before:
  Filter: foobar(http.user_agent) contains "UPDATE"
  dftest: Syntax error near "(".

After:
  Filter: foobar(http.user_agent) contains "UPDATE"
  dftest: The function 'foobar' does not exist.

This has the problem that a function cannot have the same name
as a protocol but that limitation already existed before.
2021-10-08 04:01:24 +00:00
João Valverde 7bf02254c1 dfilter: Rename function production rule
Make it more obvious that entities are also functions.
2021-10-05 19:19:36 +01:00
João Valverde a940318f37 dfilter: Minor grammar fixups
Clean up syntax error code. TEST and SET are never returned by
the tokenizer.

Remove unnecessary range_body() grammar element. Fix a comment.

Move the stnode_token_value() function to its proper place.
2021-10-05 17:56:21 +01:00
João Valverde d45ba348fd dfilter: Strengthen sanity check for range
Allow an entity in the grammar as range body. Perform a stronger
sanity check during semantic analysis everywhere a range is used.
This is both safer (unless we want to allow FIELD bodies only, but
functions are allowed too) and also provides better error messages.

Previously a range of range only compiled on the RHS. Now it can
appear on both sides of a relation.

This fixes a crash with STRING entities similar to #10690 for
UNPARSED.

This also adds back support for slicing functions that was removed
in f3f833ccec (by accident presumably).

Ping #10690
2021-10-05 16:39:41 +01:00
João Valverde c7dc907d0e dfilter: Rename some identifiers in grammar
Prefer grammar names for readibility over C names.

Prefer rel_binop to rel_op2. Clean formatting.
2021-10-01 16:58:42 +00:00
João Valverde 2c55bffb41 dfilter: Improve syntax error message
Pass simple token value and use it for the error message. This string
is freed in the parser destructor.
2021-10-01 16:04:37 +00:00
João Valverde db18865e55 dfilter: Save token value to syntax tree
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.

Before:
  % dftest '"01:02:03:04"[0:3] == foo'
  Filter: ""01:02:03:04"[0:3] == foo"
  dftest: Range is not supported for entity <0x7f6c84017740> of type STRING

After:
  % dftest '"01:02:03:04"[0:3] == foo'
  Filter: ""01:02:03:04"[0:3] == foo"
  dftest: Range is not supported for entity 01:02:03:04 of type STRING

When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
2021-10-01 16:04:37 +00:00
João Valverde b4af7c52a5 dfilter: Add a flags member to the syntax tree node
Use it to record "inside parenthesis".
2021-09-30 17:03:55 +00:00
João Valverde 0e50979b3f Replace g_assert() with ws_assert() 2021-06-19 01:23:31 +00:00
Guy Harris b61fd6d76a dfilter, ftypes: get rid of FT_PCRE.
It's not a valid field type, it's only a hack to support regular
expression matching in packet-matching expressions.

Instead, in the packet-matching code, have a separate syntax tree type
for Perl-compatible regular expressions, and a separate instruction to
load one into a register, and have the "matching" operator for field
types take a GRegex * as the second argument.
2021-03-21 03:27:44 -07:00
Guy Harris 7d4b47a073 Further improve that error message.
Put the function name in quotes.

Change-Id: I09be392a9bac3b56c13b82a554d17ea29695657c
Reviewed-on: https://code.wireshark.org/review/31790
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-01-28 22:23:45 +00:00
Guy Harris e8f54b8aed Fix an error message.
I guess "s" in "The function s" was supposed to be "%s", giving the
function name.  Make it so, and properly fetch the function name.

Change-Id: I67287f24626fa0a2816fb2cf574e5d9ff58713bf
Reviewed-on: https://code.wireshark.org/review/31787
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-01-28 22:08:41 +00:00
Peter Wu 6a45dcd7a2 dfilter: require spaces as set element separator
Previously a filter such as `http.request.method in {"GET"HEAD""}` would
be parsed as three strings (GET, HEAD and an empty string). As it seems
more likely that people make typos rather than intending to construct
such a filter, forbid this by always requiring a whitespace separator.

Change-Id: I77e531fd6be072f62dd06aac27f856106c8920c6
Reported-by: Stig Bjørlykke
Reviewed-on: https://code.wireshark.org/review/26989
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2018-04-18 03:47:58 +00:00
Peter Wu 1ff82572ca dfilter: add range support to set membership operator ("f in {x .. y}")
Allow "tcp.srcport in {1662 1663 1664}" to be abbreviated to
"tcp.srcport in {1662 .. 1664}". The range operator is supported for any
field value which supports the "<=" and "=>" operators and thus works
for integers, IP addresses, etc.

The naive mapping "tcp.srcport >= 1662 and tcp.srcport <= 1664" is not
used because it does not have the intended effect with fields that have
multiple occurrences (e.g. tcp.port). Each condition could be satisfied
by an other value. Therefore a new DVFM instruction (ANY_IN_RANGE) is
added to test the range condition against each individual field value.

Bug: 14180
Change-Id: I53c2d0f9bc9d4f0ffaabde9a83442122965c95f7
Reviewed-on: https://code.wireshark.org/review/26945
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2018-04-18 03:47:02 +00:00
Guy Harris d7fe514fc0 Improve support for single-character fields and filter expressions.
Add an FT_CHAR type, which is like FT_UINT8 except that the value is
displayed as a C-style character constant.

Allow use of C-style character constants in filter expressions; they can
be used in comparisons with all integral types, and in "contains"
operators.

Use that type for some fields that appear (based on the way they're
displayed, or on the use of C-style character constants in their
value_string tables) to be 1-byte characters rather than 8-bit numbers.

Change-Id: I39a9f0dda0bd7f4fa02a9ca8373216206f4d7135
Reviewed-on: https://code.wireshark.org/review/17787
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2016-09-19 02:51:13 +00:00
Stig Bjørlykke b86e2a3609 Dfilter: Mark an error in %syntax_error
Because of a change in lemon the %parse_failure is not always called.

Bug: 11637
Change-Id: Iea218aeee10e20f29461169829a10345bbdac903
Reviewed-on: https://code.wireshark.org/review/11302
Petri-Dish: Stig Bjørlykke <stig@bjorlykke.org>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Tested-by: Jeff Morriss <jeff.morriss.ws@gmail.com>
Reviewed-by: Stig Bjørlykke <stig@bjorlykke.org>
2015-10-27 17:22:38 +00:00
Jeffrey Smith 80322d88da dfilter: Add membership operator
Added a new relational test: 'x in {a b c}'.  The only LHS entity
supported at this time is a field.  The generated DFVM operations are
equivalent to an OR'ed series of =='s, but with the redundant existence
tests removed.

Change-Id: Iddc89b81cf7ad6319aef1a2a94f93314cb721a8a
Reviewed-on: https://code.wireshark.org/review/10246
Reviewed-by: Hadriel Kaplan <hadrielk@yahoo.com>
Petri-Dish: Hadriel Kaplan <hadrielk@yahoo.com>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
2015-09-11 06:31:33 +00:00
Alexis La Goutte 2e1fa634c6 Lemon grammar: fix indent (use tabs)
Change-Id: I6fa38d5d85b25ac6c55fcfa67d6c8dba8482cc8c
Reviewed-on: https://code.wireshark.org/review/10266
Petri-Dish: Alexis La Goutte <alexis.lagoutte@gmail.com>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2015-08-27 04:35:23 +00:00
Guy Harris cfcbb28671 Clean up ftype-conversion and dfilter error message string handling.
Have dfilter_compile() take an additional gchar ** argument, pointing to
a gchar * item that, on error, gets set to point to a g_malloc()ed error
string.  That removes one bit of global state from the display filter
parser, and doesn't impose a fixed limit on the error message strings.

Have fvalue_from_string() and fvalue_from_unparsed() take a gchar **
argument, pointer to a gchar * item, rather than an error-reporting
function, and set the gchar * item to point to a g_malloc()ed error
string on an error.

Allow either gchar ** argument to be null; if the argument is null, no
error message is allocated or provided.

Change-Id: Ibd36b8aaa9bf4234aa6efa1e7fb95f7037493b4c
Reviewed-on: https://code.wireshark.org/review/6608
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2015-01-18 10:22:59 +00:00
Martin Kaiser f3f833ccec display filter: the body of a range should only be
a string, a field name or another range - not an unparsed element

Bug: 10690
Change-Id: I126143636c940cc73ed6467660f0a573209e2ae9
Reviewed-on: https://code.wireshark.org/review/5243
Reviewed-by: Martin Kaiser <wireshark@kaiser.cx>
Tested-by: Martin Kaiser <wireshark@kaiser.cx>
2014-11-17 07:05:35 +00:00
Peter Wu f2b4daf400 Add printf-format annotations, fix garbage
The WRETH dissector showed up some garbage in the column display. Upon
further inspection, it turns out that the format string had a trailing
percent sign which caused (unsigned)-1 to be returned by
g_printf_string_upper_bound (in emem_strdup_vprintf). Then ep_alloc is
called with (unsigned)-1 + 1 = 0 memory, no wonder that garbage shows
up. ASAN could not even catch this error because EP is in charge of
this.

So, start adding G_GNUC_PRINTF annotations in each header that uses
the "fmt" or "format" paramters (grepped + awk). This revealed some
other errors. The NCP2222 dissector was missing a format string (not
a security vuln though).

Many dissectors used val_to_str with a constant (but empty) string,
these have been replaced by val_to_str_const. ASN.1 dissectors
were regenerated for this.

Minor: the mate plugin used "%X" instead of "%p" for a pointer type.

The ncp2222 dissector and wimax plugin gained modelines.

Change-Id: I7f3f6a3136116f9b251719830a39a7b21646f622
Reviewed-on: https://code.wireshark.org/review/2881
Reviewed-by: Evan Huus <eapache@gmail.com>
2014-07-06 23:00:40 +00:00
Alexis La Goutte 7d77d753c6 Continue to remove $Id$ from top of file
(Using sed :sed -i '/^\/\* \$Id\$ \*\//,+0 d') ( /* $Id */ )

Change-Id: I46e928d7f2a307c35876ed5d34cb6b7cccfcd6e9
Reviewed-on: https://code.wireshark.org/review/886
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2014-03-31 18:49:26 +00:00
Gerald Combs 465e4664de Use "(void) <variable/>" to avoid unused variable warnings similar to
Qt's Q_UNUSED macro.

svn path=/trunk/; revision=54110
2013-12-14 23:44:25 +00:00
Jakub Zawadzki ae59b09443 Add missing includes in order to remove exceptions.h from proto.h (next commit).
svn path=/trunk/; revision=53230
2013-11-10 15:59:37 +00:00
Jakub Zawadzki c6669a3c63 dfilter: report warning if OR and AND logic operands are mixed without parentheses.
svn path=/trunk/; revision=51247
2013-08-10 17:49:28 +00:00
Jakub Zawadzki 73aa1e7807 Support drange for functions
last think from bug #8979
+ fix semcheck.c:875: warning: signed and unsigned type in conditional expression

svn path=/trunk/; revision=50951
2013-07-27 19:14:34 +00:00
Jeff Morriss 3729335973 We always HAVE_CONFIG_H so don't bother checking whether we have it or not.
svn path=/trunk/; revision=45016
2012-09-20 01:48:30 +00:00
Jakub Zawadzki addf9236dc Support multiple relation test without logic and (python-like)
Like: 
  a == b == c 
  or 
  a < b <= c <= d < e 

Real life example:
  6660 <= tcp.port <= 6669

Just syntactic sugar, this is *NOT* optimized.

svn path=/trunk/; revision=43353
2012-06-19 12:12:41 +00:00
Anders Broman d1c1455882 Fix warnings
svn path=/trunk/; revision=43046
2012-06-03 20:59:41 +00:00
Luis Ontanon 9709011a9b Implement a proposal from Elefterios Gabriel for SCCP:
Add a table of DPCs and SSNs that allow to override the protocol that would be choosen
so that the same SSN can use two different protocols in two different DPCs.

I did not believe it someone could have done it, then I saw the captures...


svn path=/trunk/; revision=21321
2007-04-03 19:08:00 +00:00
Luis Ontanon 22004e8190 productions of non-terminal "sentence" do not generate any value. Avoid a destructor being called for them.
see http://www.sqlite.org/cvstrac/tktview?tn=2172


svn path=/trunk/; revision=20460
2007-01-17 16:41:22 +00:00
Gilbert Ramirez e3899ed4a4 Add infrastructure for display filter functions.
Add upper() and lower() display filter functions for string fields.

svn path=/trunk/; revision=18071
2006-05-02 14:26:17 +00:00
Jörg Mayer 96adc5f4a1 - Include the .h files in their .c files.
- Remove epan/dissectors/packet-sna.h, it isn't used anywhere.

svn path=/trunk/; revision=15475
2005-08-20 16:19:22 +00:00
Guy Harris 8a8b883450 Set the svn:eol-style property on all text files to "native", so that
they have LF at the end of the line on UN*X and CR/LF on Windows;
hopefully this means that if a CR/LF version is checked in on Windows,
the CRs will be stripped so that they show up only when checked out on
Windows, not on UN*X.

svn path=/trunk/; revision=11400
2004-07-18 00:24:25 +00:00
Guy Harris e1e690ff3a From Graeme Hewson:
Use gint32 instead of guint32 for node data.

Fix up some other signed-vs-unsigned issues in the display filter
parser and lexical analyzer.

svn path=/trunk/; revision=11085
2004-06-03 07:36:25 +00:00
Olivier Biot 1791f84919 First attempt at "bitwise AND" display filter operator.
Document how a display operator can be added.

svn path=/trunk/; revision=10250
2004-02-27 12:00:32 +00:00
Guy Harris b9b4a23834 Make an existence test of an arbitrary entity syntactically valid, but
check, in the semantics-checking phase, that we're testing a field, so
that we can give a better message than, for example, "Unexpected end of
filter string." for an existence test with a misspelled field name.

svn path=/trunk/; revision=10043
2004-02-11 21:20:52 +00:00
Gilbert Ramirez 55a6251e7c From Olivier Biot
New "matches" operater in display filter language. Uses PCRE.

If a "matches" operator is found in a dfilter
while libpcre has not been used to build the binary, then an
exception is thrown after using dfilter_fail() to set an apporporiate
error message.

svn path=/trunk/; revision=9182
2003-12-06 16:35:20 +00:00
Gilbert Ramirez 52338a3baf Add a "contains" operator for byte-strings, strings, and tvbuffs (protocols).
The search uses a naive approach; more work is required to add a
Boyer-Moore Search algorithm.

svn path=/trunk/; revision=8280
2003-08-27 15:23:11 +00:00