Revert to the original design of having a single pattern to catch
everything as unparsed and also try to be less hackish and fragile
parsing "..".
Strings like "80..90" are tricky because it can be parsed as
INTEGER DOTDOT INTEGER or FLOAT FLOAT.
Deprecate the usage of significant whitespace to separate set elements
(or anywhere else for that matter). This will make the implementation
simpler and cleaner and the language more expressive and user-friendly.
Fixup comment to reflect that unparsed is the only token
type for the pattern, there is no "match" or "no match".
Tighten the CIDR patterns and make sure it cannot match a
rogue "..". This is ambiguous with 80..90 and must be treated
extra carefully to support floating point formats without
leading or terminating digits before or after the point.
Reject the default match as a syntax error. This pattern is
just a pitfall to make testing and debugging harder. Matching a
single arbitrary character is never a valid unparsed string,
except by chance.
Replace the LVAL macros with a plain variable declaration.
The other macros and assorted compatibility code has been superseded
by the requirement of reentrant flex and the use of more modern flex
options, as far as I can tell.
Using a hand written tokenizer is simpler than using flex start
conditions. Do the validation in the drange node constructor.
Add validation for malformed ranges with different endpoint signs.
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.
Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.
Fix the '!=' relation to match the definition of equality:
A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.
This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.
The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.
Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.
Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.
Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
Always use the internal API to access "deprecated" and initialize
the data structure on demand. This fixes a null pointer dereference
introduced previously.
Use reference counting to share the array cleanly and avoid memory
leaks.
Keep the pointer in dfwork_t.
The lexical rules for fields and unparsed strings are ambiguous,
e.g. "fc" can be the protocol fibre channel or the byte 0xfc.
In general a name is determined to be a protocol field or not by
checking the registry.
Resolving the name in the parser gives more flexibility, for example
to use different semantic rules according to the relation between
LHS and RHS, and allows function names and protocol names to co-exist
without ambiguity.
Before:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
Filter: tcp() == 1
dftest: Syntax error near "(".
After:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
(same)
Filter: tcp() == 1
dftest: Function 'tcp' does not exist
It's also a goal to make it easier to modify the lexer rules.
Ping #12810.
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.
Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
A function is grammatically an identifier that is followed by '(' and ')'
according to some rules. We should avoid assuming a token is a function
just because it matches a registered function name.
Before:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: Syntax error near "(".
After:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: The function 'foobar' does not exist.
This has the problem that a function cannot have the same name
as a protocol but that limitation already existed before.
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.
Before:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity <0x7f6c84017740> of type STRING
After:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity 01:02:03:04 of type STRING
When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
Pass the deprecated data struture to the scanner and insert the deprecated
tokens there. This avoids having to keep a dedicated syntax node field
for this.
Pass the deprecated argument in dfwork_t instead of in a separate
argument. This is less cumbersome than adding an extra argument
to every level of the semantic checker.
When byte escape sequences, that is hex \xhh or octal \0ddd,
are interpreted at the lexical level it is not possible to
use strings with embedded NUL bytes. The NUL byte is interpreted
as a C string terminator. As a consequence, for example, the
strings "AB" and "AB\x00CDE" compare as the same. This leads to
unexpected false matches and a poor user experience.
Disallow embedded NULs for regular strings (strings literals that
do not begin with 'r' or 'R') for this reason.
It is possible to use a raw string instead (eg: r"AB\x00C")
to match embedded NUL bytes, although that only works with regular
expressions. Normal escape rules would also work with regular
expressions (eg: "AB\\x00C"). This is the same string as the previous
one, written in an alternate form. What won't work is "AB\x00C", this
string is synctatically invalid.
So the expression: data matches r"AB\x00C"
will match the bytes {'A', 'B', '\0', '\C'}.
However the expression: data contains r"AB\x00C"
won't match the fvalue above. Because the "contains" operator
doesn't compile a regular expression it literally tries to
contains-match the bytes {'A', 'B', '\\', 'x', '0', '0', 'C'}.
Therefore raw strings are very convenient but it is still necessary
to be aware that the matches operator has an extra level of indirection
than other string operators (same as in Python).
Fixes#16156.
Add support for a literal string specification copied from Python
raw strings[1].
Raw string literals are enclosed with r"..." or R"...". Double quotes
can be include in the string but they must be escaped with backslash.
In escape sequences backslashes are preserved in the final result.
So for example the string "a\\\"b" is the same as r"a\"b".
r"\\\a" is the same as "\\\\\\a".
Raw strings should be used for convenience wherever a regular expression
is used in a display filter expression.
[1]https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
(It's llvm-gcc, and appears to be a version too old to support
_Pragma("gcc diagnostic ...").)
Change-Id: I220997890636d5a5c33889d2e6c640feee155deb
Reviewed-on: https://code.wireshark.org/review/29580
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Add support for aliasing one protocol name to another and for filtering
using aliased fields. Mark aliased fields as deprecated.
Rename the BOOTP dissector to DHCP and alias "bootp" to "dhcp". This
lets you use both "dhcp.type" and "bootp.type" as display filter fields
without having to duplicate all 500+ DHCP/BOOTP fields.
To do:
- Add checks to proto.c:check_valid_filter_name_or_fail?
- Transition SSL to TLS.
- Rename packet-bootp.c to packet-dhcp.c?
Change-Id: I29977859995e8347d80b8e83f1618db441b10279
Ping-Bug: 14922
Reviewed-on: https://code.wireshark.org/review/29327
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Petri-Dish: Gerald Combs <gerald@wireshark.org>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Previously a filter such as `http.request.method in {"GET"HEAD""}` would
be parsed as three strings (GET, HEAD and an empty string). As it seems
more likely that people make typos rather than intending to construct
such a filter, forbid this by always requiring a whitespace separator.
Change-Id: I77e531fd6be072f62dd06aac27f856106c8920c6
Reported-by: Stig Bjørlykke
Reviewed-on: https://code.wireshark.org/review/26989
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
For numeric values such as port numbers, "4430..4434" looks more
natural than "4430 .. 4434", so support that.
To make this possible, the display filter syntax needs to be restricted.
Assume that neither field names nor values can contain "..". The display
filter `data contains ..` will now be considered a syntax error and must
be written as `data contains ".."` instead. More generally, all values
that contain ".." must be quoted.
Other than the ".." restriction, the scanner deliberately accepts more
characters that can potentially form invalid input. This is to prevent
accidentally splitting input in multiple tokens. For example, "9.2." in
"frame.time_delta in {9.2.}" is currently parsed as one token and then
rejected because it cannot be parsed as time. If the scanner was made
stricter, it could treat it as two tokens (floats), "9." and "2." which
has different meaning for the set membership operator.
An unhandled edge case is "1....2" which is parsed as "1 .. .. 2" but
could have been parsed as "1. .. .2" instead. A float with trailing dots
followed by ".." seems sufficiently weird, so rejection is fine.
Ping-Bug: 14180
Change-Id: Ibad8e851b49346c9d470f09d5d6a54defa21bcb9
Reviewed-on: https://code.wireshark.org/review/26960
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Allow "tcp.srcport in {1662 1663 1664}" to be abbreviated to
"tcp.srcport in {1662 .. 1664}". The range operator is supported for any
field value which supports the "<=" and "=>" operators and thus works
for integers, IP addresses, etc.
The naive mapping "tcp.srcport >= 1662 and tcp.srcport <= 1664" is not
used because it does not have the intended effect with fields that have
multiple occurrences (e.g. tcp.port). Each condition could be satisfied
by an other value. Therefore a new DVFM instruction (ANY_IN_RANGE) is
added to test the range condition against each individual field value.
Bug: 14180
Change-Id: I53c2d0f9bc9d4f0ffaabde9a83442122965c95f7
Reviewed-on: https://code.wireshark.org/review/26945
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
DIAG_OFF_FLEX turns off all warnings that we want to disable for
Flex-generated code due to some versions of Flex generating code that
triggers those warnings.
DIAG_ON_FLEX restores those warnings, so we do the checks for code that
*we* wrote.
Use them in .l files.
Change-Id: I613a20309a30cd4c61111a1edbe27a5d05fcbf59
Reviewed-on: https://code.wireshark.org/review/25815
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
That way, if we #define anything for large file support, that's done
before we include any system header files that either depend on that
definition or that define it themselves if it's not already defined.
Change-Id: I9b07344151103be337899dead44d6960715d6813
Reviewed-on: https://code.wireshark.org/review/19035
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Add an FT_CHAR type, which is like FT_UINT8 except that the value is
displayed as a C-style character constant.
Allow use of C-style character constants in filter expressions; they can
be used in comparisons with all integral types, and in "contains"
operators.
Use that type for some fields that appear (based on the way they're
displayed, or on the use of C-style character constants in their
value_string tables) to be 1-byte characters rather than 8-bit numbers.
Change-Id: I39a9f0dda0bd7f4fa02a9ca8373216206f4d7135
Reviewed-on: https://code.wireshark.org/review/17787
Reviewed-by: Guy Harris <guy@alum.mit.edu>
master-branch libpcap now generates a reentrant Flex scanner and
Bison/Berkeley YACC parser for capture filter expressions, so it
requires versions of Flex and Bison/Berkeley YACC that support that.
We might as well do the same. For libwiretap, it means we could
actually have multiple K12 text or Ascend/Lucent text files open at the
same time. For libwireshark, it might not be as useful, as we only read
configuration files at startup (which should only happen once, in one
thread) or on demand (in which case, if we ever support multiple threads
running libwireshark, we'd need a mutex to ensure that only one file
reads it), but it's still the right thing to do.
We also require a version of Flex that can write out a header file, so
we change the runlex script to generate the header file ourselves. This
means we require a version of Flex new enough to support --header-file.
Clean up some other stuff encountered in the process.
Change-Id: Id23078c6acea549a52fc687779bb55d715b55c16
Reviewed-on: https://code.wireshark.org/review/14719
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Tweak lemonflex-tail.inc to fix an issue this reveals.
It appears that, at least on the buildbots, the Visual Studio compiler
no longer issues warnings for the code generated with %option noyywrap.
Change-Id: Id64d56f1ae8a79d0336488a4a50518da1f511497
Reviewed-on: https://code.wireshark.org/review/12433
Reviewed-by: Guy Harris <guy@alum.mit.edu>
We don't have any Flex scanners that support an interactive command-line
interface, so none of our scanners are, or need to be, interactive.
Mark text2pcap's scanner as not interactive.
That means none of our scanners should call isatty(), so they don't have
any need to include <io.h> on Windows; remove that include from the
Lucent/Ascent text capture scanner.
Update a comment to reflect that what matters isn't whether we can read
from a terminal or whether we actually do so, what matters is whether
they read *interactively* from a terminal (if you want to run text2pcap
reading from the standard input and type at it, be my guest).
Change-Id: I59979d1fdb37e1913125a400963ff7a3fa6b9bbd
Reviewed-on: https://code.wireshark.org/review/11587
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Guy Harris <guy@alum.mit.edu>
While IPv4 subnet masks are obviously related and similar to IPv4
addresses, they are distinct enough that they need to be treated
seperately in some aspects. For instance, there is no value in
attempting to resolve a subnet mask.
This change creates a new display type: BASE_NETMASK, which allows distinction from FT_IPv4
(and possible name resolution) where appropriate.
Change-Id: I99e19c9a58eb613f8e58d481af84c30e2e5e14d7
Reviewed-on: https://code.wireshark.org/review/10438
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
Reviewed-by: Michael Mann <mmann78@netscape.net>
Added a new relational test: 'x in {a b c}'. The only LHS entity
supported at this time is a field. The generated DFVM operations are
equivalent to an OR'ed series of =='s, but with the redundant existence
tests removed.
Change-Id: Iddc89b81cf7ad6319aef1a2a94f93314cb721a8a
Reviewed-on: https://code.wireshark.org/review/10246
Reviewed-by: Hadriel Kaplan <hadrielk@yahoo.com>
Petri-Dish: Hadriel Kaplan <hadrielk@yahoo.com>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
Have dfilter_compile() take an additional gchar ** argument, pointing to
a gchar * item that, on error, gets set to point to a g_malloc()ed error
string. That removes one bit of global state from the display filter
parser, and doesn't impose a fixed limit on the error message strings.
Have fvalue_from_string() and fvalue_from_unparsed() take a gchar **
argument, pointer to a gchar * item, rather than an error-reporting
function, and set the gchar * item to point to a g_malloc()ed error
string on an error.
Allow either gchar ** argument to be null; if the argument is null, no
error message is allocated or provided.
Change-Id: Ibd36b8aaa9bf4234aa6efa1e7fb95f7037493b4c
Reviewed-on: https://code.wireshark.org/review/6608
Reviewed-by: Guy Harris <guy@alum.mit.edu>
(Using sed : sed -i '/^ \* \$Id\$/,+1 d')
Fix manually some typo (in export_object_dicom.c and crc16-plain.c)
Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8
Reviewed-on: https://code.wireshark.org/review/497
Reviewed-by: Anders Broman <a.broman58@gmail.com>
input() routine and thus don't need to have it generated - and as it
produces warnings of a routine defined but not used, we don't want to
have it generated.
Squelch a casting-const-away warning.
svn path=/trunk/; revision=47613
escape sequences; sscanf is a bit heavyweight, and using strtoul() also
squelches some "return value ignored" warnings.
svn path=/trunk/; revision=37007
so that the config.h definitions are available before we include
anything else; that way, for example, anything defined to enable
large-file support will be defined before we include any system header
files that might depend on it.
svn path=/trunk/; revision=36832