For unparsed values on the RHS of a comparison try
to parse them first as a literal and only then as
a protocol. This is more complicated in code but
should be a use case a lot more common and useful in
practice.
It removes some annoying special cases and applies this
rule consistently to any expression. Consistency is
important otherwise the special cases and exceptions
make the language confusing and difficult to learn.
For values on the LHS the rule remains to first try a
protocol value, then a literal.
Related with issue #17731.
A literal value is a value that cannot be interpreted as a
registered protocol. An unparsed value can be a literal or
an identifier (protocol/field) according to context and the
current disambiguation rules.
Strictly literal here is to be understood to mean "numeric
literal, including numeric arrays, but not strings or character
constants".
The syntax for protocols and some literals like numbers
and bytes/addresses can be ambiguous. Some protocols can
be parsed as a literal, for example the protocol "fc"
(Fibre Channel) can be parsed as 0xFC.
If a numeric protocol is registered that will also take
precedence over any literal, according to the current
rules, thereby breaking numerical comparisons to that
number. The same for an hypothetical protocol named "true",
etc.
To allow the user to disambiguate this meaning introduce
new syntax.
Any value prefixed with ':' or enclosed in <,> will be treated
as a literal value only. The value :fc or <fc> will always
mean 0xFC, under any context. Never a protocol whose filter
name is "fc".
Likewise any value prefixed with a dot will always be parsed
as an identifier (protocol or protocol field) in the language.
Never any literal value parsed from the token "fc".
This allows the user to be explicit about the meaning,
and between the two explicit methods plus the ambiguous one
it doesn't completely break any one meaning.
The difference can be seen in the following two programs:
Filter: frame == fc
Constants:
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 5
00002 READ_TREE fc -> reg#1
00003 IF-FALSE-GOTO 5
00004 ANY_EQ reg#0 == reg#1
00005 RETURN
--------
Filter: frame == :fc
Constants:
00000 PUT_FVALUE fc <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
The filter "frame == fc" is the same as "filter == .fc",
according to the current heuristic, except the first form
will try to parse it as a literal if the name does not
correspond to any registered protocol.
By treating a leading dot as a name in the language we
necessarily disallow writing floats with a leading dot. We
will also disallow writing with an ending dot when using
unparsed values. This is a backward incompatibility but has
the happy side effect of making the expression {1...2}
unambiguous.
This could either mean "1 .. .2" or "1. .. 2". If we require
a leading and ending digit then the meaning is clear:
1.0..0.2 -> 1.0 .. 0.2
Fixes#17731.
Update the text2pcap tests to use the new tshark hexdump option
(see b5f89dbe2d ), which allow us to get a consistent roundtrip
of results instead of having to override the expected number of packets
and data size.
The text2pcap tests that use a capinfos->tshark->text2pcap->capinfos
cycle need to use the -a flag for identifying when the start of the
ASCII dump looks like hex, since the tshark -x output is a hex+ASCII
format. Adding the flag means that we can remove the override of the
expected data size for the dns_icmp.pcapng.gz file. (It also affects
the file with multiple data sources, but another issue remains there.)
Use the -F file type flag to have the text2pcap tests produce the
same file type as the input flag, which is a little superior when
the input flag is a nanosecond pcap.
Note that commit 5076aee044 means that
capinfos -M provides a machine-readable filetype that's easier to put
back into text2pcap.
Add the option to enter a filter with an absolute time
value in UTC. Otherwise the value is interpreted in
local time.
The syntax used is an "UTC" suffix, for example:
frame.time == "Dec 31, 2002 13:55:31.3 UTC"
This also changes the behavior of "Apply Selected as filter".
Fields using a local time display type will use local time
and fields using UTC display type will be applied using UTC.
Fixes#13268.
Remove the '-d' option from text2pcap, and move the two levels
of debug messages in text2pcap and text_import to either
LOG_LEVEL_DEBUG or LOG_LEVEL_NOISY as appropriate.
text2pcap now has support for fractional sections using the field
descriptor %f and doesn't support the old method, so change the format
string in the test. None of the existing tests depended on the
fractional seconds being correct.
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.
The symbol chosen for "all_eq" is "===".
Repeated words were found with:
egrep "(\b[a-zA-Z]+) +\1\b" . -Ir
and then manually reviewed.
Non-displayed strings (e.g., in comments)
were also corrected, to ease future review.
The case_tshark_name_resolution_clopts test doesn't need live capture,
so switch to a capture file. This should fix the current failure on the
macOS Arm builder.
Use that for error messages, including any using test operators.
This allows to always use the same name as the user. It avoids
cases where the user write "a && b" and the message is "a and b"
is syntactically invalid.
It should also allow us to be more consistent with the use of
double quotes.
Add an UAT for configuring fake headers according to the server port, stream
id and direction of the long-lived stream that we start capturing packets
after it is established. That helps to parsing the DATAs captured subsequently.
A testcase also added.
close#17691
This reverts commit d635ff4933.
A charconst cannot be a value string, for that reason it is not
redundant with unparsed.
Maybe character constants should be parsed in the lexical scanner
instead.
Before:
Filter: ip.proto == '\g'
dftest: "'\g'" cannot be found among the possible values for ip.proto.
After:
Filter: ip.proto == '\g'
dftest: "'\g'" isn't a valid character constant.
For double quoted strings. This is consistent with single quote
character constants and the C standard. It also avoids common
mistakes where the superfluous backslash is silently suppressed.
A charconst uses the same semantic rules as unparsed so just
use the latter to avoid redundancies.
We keep the use of TOKEN_CHARCONST as an optimization to avoid
an unnecessary name resolution (lookup for a registered field with
the same name as the charconst).
Asterix data format is a complex family of asterix categories,
where each individual category exists in multiple editions.
As a result of many variants, the epan/dissectors/packet-asterix.c
is one of the largest dissectors.
So far, the asterix dissector had been maintained manually, where the
generic decoding routines and category/edition specific definitions
were entangled in the same file (packet-asterix.c).
This commit preserves the overall dissector structure, but makes
it easy to update the dissector with new categories or editions as
they become available (via the update script from this commit).
See tools/asterix/README.md file for dissector update procedure.
This commit includes:
- tools/asterix/packet-asterix-template.c
Extraction of generic asterix decoding routines and
common data structures.
- tools/asterix/update-specs.py
Update script, to render the template with up-to-date asterix
specs files. The asterix specs files themselves are maintained in
a separate repository.
- epan/dissectors/packet-asterix.c
Automatically generated dissector for asterix data format.
Although generated, this file needs to remain in the repository,
to be able to build the project in a reproducible way.
The generated asterix dissector was additionally tested with:
- ./tools/check_typed_item_calls.py --mask
- ./tools/fuzz-test.sh
Sync with asterix-specs #cef694825c
Revert to the original design of having a single pattern to catch
everything as unparsed and also try to be less hackish and fragile
parsing "..".
Strings like "80..90" are tricky because it can be parsed as
INTEGER DOTDOT INTEGER or FLOAT FLOAT.
Deprecate the usage of significant whitespace to separate set elements
(or anywhere else for that matter). This will make the implementation
simpler and cleaner and the language more expressive and user-friendly.
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.
Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.
Fix the '!=' relation to match the definition of equality:
A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.
This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.
The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.
Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.
Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.
Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
- Point all MSP related DATA frames to their MSP instead of
using wmem_tree_lookup32_array_le().
- Add test_grpc_streaming_mode_reassembly testcase for verifying
this feature.
close#17633
Matches is a special case that looks on the RHS and tries
to convert every unparsed value to a string, regardless
of the LHS type. This is not how types work in the display
filter. Require double-quotes to avoid ambiguity, because
matches doesn't follow normal Wireshark display filter
type rules. It doesn't need nor benefit from the flexibility
provided by unparsed strings in the syntax.
For matches the RHS is always a literal strings except
if the RHS is also a field name, then it complains of an
incompatible type. This is confusing. No type can be compatible
because no type rules are ever considered. Every unparsed value is
a text string except if it happens to coincide with a field
name it also requires double-quoting or it throws a syntax error,
just to be difficult. We could remove this odd quirk but requiring
double-quotes for regular expressions is a better, more elegant
fix.
Before:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp.srcport
dftest: tcp and udp.srcport are not of compatible types.
Filter: tcp matches udp.srcportt
Constants:
00000 PUT_PCRE udp.srcportt -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
After:
Filter: tcp matches "udp"
Constants:
00000 PUT_PCRE udp -> reg#1
Instructions:
00000 READ_TREE tcp -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_MATCHES reg#0 matches reg#1
00003 RETURN
Filter: tcp matches udp
dftest: "udp" was unexpected in this context.
Filter: tcp matches udp.srcport
dftest: "udp.srcport" was unexpected in this context.
Filter: tcp matches udp.srcportt
dftest: "udp.srcportt" was unexpected in this context.
The error message could still be improved.
It won't work with embedded null bytes so don't try. This is
not an additional restriction, it just removes a hidden failure
mode. To support matching embedded NUL bytes we would have
to use an internal string representation other than
null-terminated C strings (which doesn't seem very onerous with
GString).
Before:
Filter: http.user_agent == 41:42:00:43
Constants:
00000 PUT_FVALUE "AB" <FT_STRING> -> reg#1
Instructions:
00000 READ_TREE http.user_agent -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
After:
Filter: http.user_agent == 41:42:00:43
Constants:
00000 PUT_FVALUE "41:42:00:43" <FT_STRING> -> reg#1
Instructions:
00000 READ_TREE http.user_agent -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
FT_PROTOCOL and FT_BYTES are the same semantic type, but one is
backed by a GByteArray and the other by a TVBuff. Use the same
semantic rules to parse both. In particular unparsed strings
are not converted to literal strings for protocols.
Before:
Filter: frame contains 0x0000
Constants:
00000 PUT_FVALUE 30:78:30:30:30:30 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_CONTAINS reg#0 contains reg#1
00003 RETURN
Filter: frame[5:] contains 0x0000
dftest: "0x0000" is not a valid byte string.
After:
Filter: frame contains 0x0000
dftest: "0x0000" is not a valid byte string.
Filter: frame[5:] contains 0x0000
dftest: "0x0000" is not a valid byte string.
Related to #17634.
If we have a STRING value in an expression and a numeric comparison
we must also check if it matches a value string before throwing
a type error.
Add appropriate tests to the test suite.
Fixes 4d2f469212.
Allow an entity in the grammar as range body. Perform a stronger
sanity check during semantic analysis everywhere a range is used.
This is both safer (unless we want to allow FIELD bodies only, but
functions are allowed too) and also provides better error messages.
Previously a range of range only compiled on the RHS. Now it can
appear on both sides of a relation.
This fixes a crash with STRING entities similar to #10690 for
UNPARSED.
This also adds back support for slicing functions that was removed
in f3f833ccec (by accident presumably).
Ping #10690
Add test/suite_external.py, which can dynamically generate tests from a
configuration file. This is intended to make happy-shark useful, but it
should make it easy to add simple TShark tests elsewhere.
The configuration file format must currently be JSON as described in the
Developer's Guide.
At least using MSYS2 python (that uses system() that uses CMD.EXE)
we must quote every command in a pipe, otherwise the "'C:' is not
recognized as an internal or external program" error occurs.
"Follow Stream" functionality assumes that all data in a single packet
belongs to the same stream. That is not true for HTTP2 and QUIC, where
we end up having data from unrelated streams.
Filter out the unwanted data directly in the protocol dissector code with
a custom `tap_handler` (as TCP already does).
Close#16093
Add git dissection test cases to existing testing suite for: finding git
packets, finding the Git Protocol version, finding the right amount of
Flush and Delimiter packets, not finding Malformed packets.
Part of #17093
Mostly functioning proof of concept for #14329. This work is intended to
allow Wireshark to support multiple packet comments per packet.
Uses and expands upon the `wtap_block` API in `wiretap/wtap_opttypes.h`.
It attaches a `wtap_block` structure to `wtap_rec` in place of its
current `opt_comment` and `packet_verdict` members to hold OPT_COMMENT
and OPT_PKT_VERDICT option values.
Create a tvbuff that covers the data portion of a block, and use that to
dissect all data in the block, including but not limited to the options.
Catch ReportedBoundsError exceptions and treat them as an indication
that the block length was too short - add an expert info to the block
length item indicating that.
Have separate routines for each block type that dissects the data in
that block type.
While we're at it, check whether the trailing block length is equal to
the header block length and, if not, report an error in the trailing
block length.
Fix the tests to match.
Add DissectorTable.try_heuristics(name, tvb, pinfo, tree). Previously,
there was no way for a Lua plugin to run an existing heuristic
dissector.
Based on Gerrit change 18718. Closes#17220.
It's not a valid field type, it's only a hack to support regular
expression matching in packet-matching expressions.
Instead, in the packet-matching code, have a separate syntax tree type
for Perl-compatible regular expressions, and a separate instruction to
load one into a register, and have the "matching" operator for field
types take a GRegex * as the second argument.
Support decrypting captures with Fast BSS Transition roaming present
by now also scanning (re)association frames for relevant information
elements and feeding it into the dot11decrypt engine.
Both (re)association request and response frames are scanned to allow
for potentially missing one frame and still be able to derive PTKs
needed for successful decryption.
Closes#17145
Change-Id: I08436582e4f83695dc606ddb92ff442d6258ef9b
Changes:
* Replaced large netperfmeter-dccp.pcapng.gz and netperfmeter.pcap.gz captures
by one common small netperfmeter.pcapng.gz for the suites follow_dccp and
netperfmeter.
* Updated test suites "follow_dccp" and "netperfmeter".
This pull request includes:
* The "Follow DCCP stream" feature.
* Updated docbook documentation for the "Follow DCCP stream" feature.
* Test for the feature.
* Corresponding packet trace for the test.
Fedora and RHEL/CentOS put libsofthsm2.so in a different location
than Debian/Ubuntu, so look there too. This causes test_tls_pkcs11
to pass instead of being skipped (if softhsm2 and the other
prerequisites are installed.)
The output of the "values" tshark glossary has over 1.3M lines. Writing
this to stdout with some test failures is problematic in a number of ways.
Also it's not helpful because stderr is written after stdout (not interleaved)
so there is no output context to the error message. The error/warning
message (from stderr, that triggered the test failure) needs to be
sufficient to provide a good understaning of the test failure.
The output is trimmed to first+last N lines. Some lines are kept as
informational and because it may be useful if the program aborts.
Fixes#17203.
Add partial support for decrypting captures with connections
established using FT-EAP. To support deriving keys for FT-EAP
the MSK is needed. This change adds MSK as a valid IEEE 802.11
protocol input key type preference as well.
Note that FT-EAP support comes with the following imitations:
- Keys can only be derived from the FT 4-way handshake messages.
- Roaming is not supported.
Add partial support for decrypting captures with connections
established using FT BSS Transition (IEEE 802.11r).
FT BSS Transition decryption comes with the following limitations:
- Only FT-PSK is supported.
- Keys can only be derived from the FT 4-way handshake messages.
- Roaming is not supported.
Some .proto files contain complex syntax that does not be described in protobuf official site
(https://developers.google.com/protocol-buffers/docs/reference/proto3-spec).
1. Update 'epan/protobuf_lang_parser.lemon' to:
1) Support complex option names format (EBNF):
optionName = ( ident | "(" fullIdent ")" ) { "." ( ident | "(" fullIdent ")" ) }
for example, "option (complex_opt2).(grault) = 654;".
2) Make enum body support 'reserved' section (EBNF):
enumBody = "{" { reserved | option | enumField | emptyStatement } "}"
3) Allow the value of field or enumValue option to be "{ ... }" other than constant:
enumValueOption = optionName "=" ( constant | customOptionValue ) ";"
fieldOption = optionName "=" ( constant | customOptionValue ) ";"
4) Allow 'group' section missing 'label' (for example, in 'oneof' section).
5) Make 'oneof' section support 'option' and 'group' sections (BNF):
oneof = "oneof" oneofName "{" { oneofField | option | group | emptyStatement } "}"
6) Ignore unused 'extend' section.
7) Fix the bug of one string being splitted into multi-lines.
2. Update 'epan/protobuf_lang_tree.c' to:
8) Fix the bug of parsing repeated option.
3. Update 'test/suite_dissection.py' to add test case for parsing complex syntax .proto files:
test/protobuf_lang_files/complex_proto_files/unittest_custom_options.proto
test/protobuf_lang_files/complex_proto_files/complex_syntax.proto
and dependency files:
test/protobuf_lang_files/well_know_types/google/protobuf/any.proto
test/protobuf_lang_files/well_know_types/google/protobuf/descriptor.proto
Refer to issue #17046
My initial fix caused several double-offset errors in TvbRange_raw()
because I was adjusting for the TvbRange's offset too early in the
process. The proper fix is to only adjust for it in the final call to
get the data.
I also simplified some of the bounds checks to be based on the values in
the TvbRange instead of calling `tvb_captured_length()` and the like,
because its bounds are already checked against the backing Tvb when it's
first taken.
Massively expanded the lua test suite to account for every combination
of passing offsets and lengths to a Tvb or TvbRange and to the
subsequent `:raw()` call.
Add case_dissect_protobuf and case_dissect_grpc in test/suite_dissection.py.
Add *.proto into the sub directories of test/protobuf_lang_files/.
Run command like 'pytest --program-path .\run\Debug\ -k "grpc or protobuf"'
in build directory (in windows) to test these cases only.