Always use the internal API to access "deprecated" and initialize
the data structure on demand. This fixes a null pointer dereference
introduced previously.
Use reference counting to share the array cleanly and avoid memory
leaks.
Keep the pointer in dfwork_t.
The lexical rules for fields and unparsed strings are ambiguous,
e.g. "fc" can be the protocol fibre channel or the byte 0xfc.
In general a name is determined to be a protocol field or not by
checking the registry.
Resolving the name in the parser gives more flexibility, for example
to use different semantic rules according to the relation between
LHS and RHS, and allows function names and protocol names to co-exist
without ambiguity.
Before:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
Filter: tcp() == 1
dftest: Syntax error near "(".
After:
Filter: tcp == 1
Constants:
00000 PUT_FVALUE 01 <FT_PROTOCOL> -> reg#1
Instructions:
(same)
Filter: tcp() == 1
dftest: Function 'tcp' does not exist
It's also a goal to make it easier to modify the lexer rules.
Ping #12810.
Do the integer conversion for ranges in the parser. This is more
conventional, I think, and allows removing the unnecessary integer
syntax tree node type.
Try to minimize the number and complexity of lexical rules for
ranges. But it seems we need to keep different states for integer
and punctuation because of the need to disambiguate the ranges
[-n-n] and [-n--n].
A function is grammatically an identifier that is followed by '(' and ')'
according to some rules. We should avoid assuming a token is a function
just because it matches a registered function name.
Before:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: Syntax error near "(".
After:
Filter: foobar(http.user_agent) contains "UPDATE"
dftest: The function 'foobar' does not exist.
This has the problem that a function cannot have the same name
as a protocol but that limitation already existed before.
When parsing we save the token value to the syntax tree. This is
useful for better error reporting. Use it to report an invalid
entity for the slice operation. Before only the memory location
was reported, which is not a good error message.
Before:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity <0x7f6c84017740> of type STRING
After:
% dftest '"01:02:03:04"[0:3] == foo'
Filter: ""01:02:03:04"[0:3] == foo"
dftest: Range is not supported for entity 01:02:03:04 of type STRING
When creating a new node from an old one we need to copy the token
value. Simple tokens such as RBRACKET, COMMA and COLON are
not part of the AST and don't have an associated semantic value.
Pass the deprecated data struture to the scanner and insert the deprecated
tokens there. This avoids having to keep a dedicated syntax node field
for this.
Pass the deprecated argument in dfwork_t instead of in a separate
argument. This is less cumbersome than adding an extra argument
to every level of the semantic checker.
When byte escape sequences, that is hex \xhh or octal \0ddd,
are interpreted at the lexical level it is not possible to
use strings with embedded NUL bytes. The NUL byte is interpreted
as a C string terminator. As a consequence, for example, the
strings "AB" and "AB\x00CDE" compare as the same. This leads to
unexpected false matches and a poor user experience.
Disallow embedded NULs for regular strings (strings literals that
do not begin with 'r' or 'R') for this reason.
It is possible to use a raw string instead (eg: r"AB\x00C")
to match embedded NUL bytes, although that only works with regular
expressions. Normal escape rules would also work with regular
expressions (eg: "AB\\x00C"). This is the same string as the previous
one, written in an alternate form. What won't work is "AB\x00C", this
string is synctatically invalid.
So the expression: data matches r"AB\x00C"
will match the bytes {'A', 'B', '\0', '\C'}.
However the expression: data contains r"AB\x00C"
won't match the fvalue above. Because the "contains" operator
doesn't compile a regular expression it literally tries to
contains-match the bytes {'A', 'B', '\\', 'x', '0', '0', 'C'}.
Therefore raw strings are very convenient but it is still necessary
to be aware that the matches operator has an extra level of indirection
than other string operators (same as in Python).
Fixes#16156.
Add support for a literal string specification copied from Python
raw strings[1].
Raw string literals are enclosed with r"..." or R"...". Double quotes
can be include in the string but they must be escaped with backslash.
In escape sequences backslashes are preserved in the final result.
So for example the string "a\\\"b" is the same as r"a\"b".
r"\\\a" is the same as "\\\\\\a".
Raw strings should be used for convenience wherever a regular expression
is used in a display filter expression.
[1]https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
(It's llvm-gcc, and appears to be a version too old to support
_Pragma("gcc diagnostic ...").)
Change-Id: I220997890636d5a5c33889d2e6c640feee155deb
Reviewed-on: https://code.wireshark.org/review/29580
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Add support for aliasing one protocol name to another and for filtering
using aliased fields. Mark aliased fields as deprecated.
Rename the BOOTP dissector to DHCP and alias "bootp" to "dhcp". This
lets you use both "dhcp.type" and "bootp.type" as display filter fields
without having to duplicate all 500+ DHCP/BOOTP fields.
To do:
- Add checks to proto.c:check_valid_filter_name_or_fail?
- Transition SSL to TLS.
- Rename packet-bootp.c to packet-dhcp.c?
Change-Id: I29977859995e8347d80b8e83f1618db441b10279
Ping-Bug: 14922
Reviewed-on: https://code.wireshark.org/review/29327
Reviewed-by: Gerald Combs <gerald@wireshark.org>
Petri-Dish: Gerald Combs <gerald@wireshark.org>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Previously a filter such as `http.request.method in {"GET"HEAD""}` would
be parsed as three strings (GET, HEAD and an empty string). As it seems
more likely that people make typos rather than intending to construct
such a filter, forbid this by always requiring a whitespace separator.
Change-Id: I77e531fd6be072f62dd06aac27f856106c8920c6
Reported-by: Stig Bjørlykke
Reviewed-on: https://code.wireshark.org/review/26989
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
For numeric values such as port numbers, "4430..4434" looks more
natural than "4430 .. 4434", so support that.
To make this possible, the display filter syntax needs to be restricted.
Assume that neither field names nor values can contain "..". The display
filter `data contains ..` will now be considered a syntax error and must
be written as `data contains ".."` instead. More generally, all values
that contain ".." must be quoted.
Other than the ".." restriction, the scanner deliberately accepts more
characters that can potentially form invalid input. This is to prevent
accidentally splitting input in multiple tokens. For example, "9.2." in
"frame.time_delta in {9.2.}" is currently parsed as one token and then
rejected because it cannot be parsed as time. If the scanner was made
stricter, it could treat it as two tokens (floats), "9." and "2." which
has different meaning for the set membership operator.
An unhandled edge case is "1....2" which is parsed as "1 .. .. 2" but
could have been parsed as "1. .. .2" instead. A float with trailing dots
followed by ".." seems sufficiently weird, so rejection is fine.
Ping-Bug: 14180
Change-Id: Ibad8e851b49346c9d470f09d5d6a54defa21bcb9
Reviewed-on: https://code.wireshark.org/review/26960
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Allow "tcp.srcport in {1662 1663 1664}" to be abbreviated to
"tcp.srcport in {1662 .. 1664}". The range operator is supported for any
field value which supports the "<=" and "=>" operators and thus works
for integers, IP addresses, etc.
The naive mapping "tcp.srcport >= 1662 and tcp.srcport <= 1664" is not
used because it does not have the intended effect with fields that have
multiple occurrences (e.g. tcp.port). Each condition could be satisfied
by an other value. Therefore a new DVFM instruction (ANY_IN_RANGE) is
added to test the range condition against each individual field value.
Bug: 14180
Change-Id: I53c2d0f9bc9d4f0ffaabde9a83442122965c95f7
Reviewed-on: https://code.wireshark.org/review/26945
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
DIAG_OFF_FLEX turns off all warnings that we want to disable for
Flex-generated code due to some versions of Flex generating code that
triggers those warnings.
DIAG_ON_FLEX restores those warnings, so we do the checks for code that
*we* wrote.
Use them in .l files.
Change-Id: I613a20309a30cd4c61111a1edbe27a5d05fcbf59
Reviewed-on: https://code.wireshark.org/review/25815
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
That way, if we #define anything for large file support, that's done
before we include any system header files that either depend on that
definition or that define it themselves if it's not already defined.
Change-Id: I9b07344151103be337899dead44d6960715d6813
Reviewed-on: https://code.wireshark.org/review/19035
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Add an FT_CHAR type, which is like FT_UINT8 except that the value is
displayed as a C-style character constant.
Allow use of C-style character constants in filter expressions; they can
be used in comparisons with all integral types, and in "contains"
operators.
Use that type for some fields that appear (based on the way they're
displayed, or on the use of C-style character constants in their
value_string tables) to be 1-byte characters rather than 8-bit numbers.
Change-Id: I39a9f0dda0bd7f4fa02a9ca8373216206f4d7135
Reviewed-on: https://code.wireshark.org/review/17787
Reviewed-by: Guy Harris <guy@alum.mit.edu>
master-branch libpcap now generates a reentrant Flex scanner and
Bison/Berkeley YACC parser for capture filter expressions, so it
requires versions of Flex and Bison/Berkeley YACC that support that.
We might as well do the same. For libwiretap, it means we could
actually have multiple K12 text or Ascend/Lucent text files open at the
same time. For libwireshark, it might not be as useful, as we only read
configuration files at startup (which should only happen once, in one
thread) or on demand (in which case, if we ever support multiple threads
running libwireshark, we'd need a mutex to ensure that only one file
reads it), but it's still the right thing to do.
We also require a version of Flex that can write out a header file, so
we change the runlex script to generate the header file ourselves. This
means we require a version of Flex new enough to support --header-file.
Clean up some other stuff encountered in the process.
Change-Id: Id23078c6acea549a52fc687779bb55d715b55c16
Reviewed-on: https://code.wireshark.org/review/14719
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Tweak lemonflex-tail.inc to fix an issue this reveals.
It appears that, at least on the buildbots, the Visual Studio compiler
no longer issues warnings for the code generated with %option noyywrap.
Change-Id: Id64d56f1ae8a79d0336488a4a50518da1f511497
Reviewed-on: https://code.wireshark.org/review/12433
Reviewed-by: Guy Harris <guy@alum.mit.edu>
We don't have any Flex scanners that support an interactive command-line
interface, so none of our scanners are, or need to be, interactive.
Mark text2pcap's scanner as not interactive.
That means none of our scanners should call isatty(), so they don't have
any need to include <io.h> on Windows; remove that include from the
Lucent/Ascent text capture scanner.
Update a comment to reflect that what matters isn't whether we can read
from a terminal or whether we actually do so, what matters is whether
they read *interactively* from a terminal (if you want to run text2pcap
reading from the standard input and type at it, be my guest).
Change-Id: I59979d1fdb37e1913125a400963ff7a3fa6b9bbd
Reviewed-on: https://code.wireshark.org/review/11587
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Guy Harris <guy@alum.mit.edu>
While IPv4 subnet masks are obviously related and similar to IPv4
addresses, they are distinct enough that they need to be treated
seperately in some aspects. For instance, there is no value in
attempting to resolve a subnet mask.
This change creates a new display type: BASE_NETMASK, which allows distinction from FT_IPv4
(and possible name resolution) where appropriate.
Change-Id: I99e19c9a58eb613f8e58d481af84c30e2e5e14d7
Reviewed-on: https://code.wireshark.org/review/10438
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
Reviewed-by: Michael Mann <mmann78@netscape.net>
Added a new relational test: 'x in {a b c}'. The only LHS entity
supported at this time is a field. The generated DFVM operations are
equivalent to an OR'ed series of =='s, but with the redundant existence
tests removed.
Change-Id: Iddc89b81cf7ad6319aef1a2a94f93314cb721a8a
Reviewed-on: https://code.wireshark.org/review/10246
Reviewed-by: Hadriel Kaplan <hadrielk@yahoo.com>
Petri-Dish: Hadriel Kaplan <hadrielk@yahoo.com>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
Have dfilter_compile() take an additional gchar ** argument, pointing to
a gchar * item that, on error, gets set to point to a g_malloc()ed error
string. That removes one bit of global state from the display filter
parser, and doesn't impose a fixed limit on the error message strings.
Have fvalue_from_string() and fvalue_from_unparsed() take a gchar **
argument, pointer to a gchar * item, rather than an error-reporting
function, and set the gchar * item to point to a g_malloc()ed error
string on an error.
Allow either gchar ** argument to be null; if the argument is null, no
error message is allocated or provided.
Change-Id: Ibd36b8aaa9bf4234aa6efa1e7fb95f7037493b4c
Reviewed-on: https://code.wireshark.org/review/6608
Reviewed-by: Guy Harris <guy@alum.mit.edu>
(Using sed : sed -i '/^ \* \$Id\$/,+1 d')
Fix manually some typo (in export_object_dicom.c and crc16-plain.c)
Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8
Reviewed-on: https://code.wireshark.org/review/497
Reviewed-by: Anders Broman <a.broman58@gmail.com>
input() routine and thus don't need to have it generated - and as it
produces warnings of a routine defined but not used, we don't want to
have it generated.
Squelch a casting-const-away warning.
svn path=/trunk/; revision=47613
escape sequences; sscanf is a bit heavyweight, and using strtoul() also
squelches some "return value ignored" warnings.
svn path=/trunk/; revision=37007
so that the config.h definitions are available before we include
anything else; that way, for example, anything defined to enable
large-file support will be defined before we include any system header
files that might depend on it.
svn path=/trunk/; revision=36832
To prevent Windows compiler errors when using flex 2.5.35.
Fixes "missing unistd.h" and yywrap "mismatched parameter" warnings
[Upcoming Part 3: ignore 'signed /unsigned mismatch' errors]
svn path=/trunk/; revision=25173
such as the fact that Flex strips all but the last component of the "-o"
argument, and that it doesn't generate a header file to declare routines
the generated lexical analyzer defines. Use that script when building
lexical analyzers, and, for each lexical analyzer, include the generated
header file in the generated analyzer.
svn path=/trunk/; revision=22446
Move the %options to the beginning if they weren't already there, and
put them in the same order in all files.
Add "prefix=" options to .l files that don't already have them, so we
don't have to pass a "-P" option.
Add "never-interactive" and "noyywrap" options to our lexical analyzers,
to remove extra isatty() checks and to eliminate the need for yywrap()
from the Flex library.
Get rid of %option nostdinit - that's the default.
Add .l.c: rules to Makefile.am files, replacing the rules for specific
.l files. Have those rules all check that $(LEX) is set.
Update the address for the FSF.
svn path=/trunk/; revision=22424