Comparisons require a field-like value on one of the sides,
or both. Change this to require on the LHS or both. There is
realy no reason that I can see to allow the relation to commute,
and it allows removing a lot of unnecessary code and extra tests.
For unparsed values on the RHS of a comparison try
to parse them first as a literal and only then as
a protocol. This is more complicated in code but
should be a use case a lot more common and useful in
practice.
It removes some annoying special cases and applies this
rule consistently to any expression. Consistency is
important otherwise the special cases and exceptions
make the language confusing and difficult to learn.
For values on the LHS the rule remains to first try a
protocol value, then a literal.
Related with issue #17731.
A literal value is a value that cannot be interpreted as a
registered protocol. An unparsed value can be a literal or
an identifier (protocol/field) according to context and the
current disambiguation rules.
Strictly literal here is to be understood to mean "numeric
literal, including numeric arrays, but not strings or character
constants".
The syntax for protocols and some literals like numbers
and bytes/addresses can be ambiguous. Some protocols can
be parsed as a literal, for example the protocol "fc"
(Fibre Channel) can be parsed as 0xFC.
If a numeric protocol is registered that will also take
precedence over any literal, according to the current
rules, thereby breaking numerical comparisons to that
number. The same for an hypothetical protocol named "true",
etc.
To allow the user to disambiguate this meaning introduce
new syntax.
Any value prefixed with ':' or enclosed in <,> will be treated
as a literal value only. The value :fc or <fc> will always
mean 0xFC, under any context. Never a protocol whose filter
name is "fc".
Likewise any value prefixed with a dot will always be parsed
as an identifier (protocol or protocol field) in the language.
Never any literal value parsed from the token "fc".
This allows the user to be explicit about the meaning,
and between the two explicit methods plus the ambiguous one
it doesn't completely break any one meaning.
The difference can be seen in the following two programs:
Filter: frame == fc
Constants:
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 5
00002 READ_TREE fc -> reg#1
00003 IF-FALSE-GOTO 5
00004 ANY_EQ reg#0 == reg#1
00005 RETURN
--------
Filter: frame == :fc
Constants:
00000 PUT_FVALUE fc <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
The filter "frame == fc" is the same as "filter == .fc",
according to the current heuristic, except the first form
will try to parse it as a literal if the name does not
correspond to any registered protocol.
By treating a leading dot as a name in the language we
necessarily disallow writing floats with a leading dot. We
will also disallow writing with an ending dot when using
unparsed values. This is a backward incompatibility but has
the happy side effect of making the expression {1...2}
unambiguous.
This could either mean "1 .. .2" or "1. .. 2". If we require
a leading and ending digit then the meaning is clear:
1.0..0.2 -> 1.0 .. 0.2
Fixes#17731.
Update the text2pcap tests to use the new tshark hexdump option
(see b5f89dbe2d ), which allow us to get a consistent roundtrip
of results instead of having to override the expected number of packets
and data size.
The text2pcap tests that use a capinfos->tshark->text2pcap->capinfos
cycle need to use the -a flag for identifying when the start of the
ASCII dump looks like hex, since the tshark -x output is a hex+ASCII
format. Adding the flag means that we can remove the override of the
expected data size for the dns_icmp.pcapng.gz file. (It also affects
the file with multiple data sources, but another issue remains there.)
Use the -F file type flag to have the text2pcap tests produce the
same file type as the input flag, which is a little superior when
the input flag is a nanosecond pcap.
Note that commit 5076aee044 means that
capinfos -M provides a machine-readable filetype that's easier to put
back into text2pcap.
Add the option to enter a filter with an absolute time
value in UTC. Otherwise the value is interpreted in
local time.
The syntax used is an "UTC" suffix, for example:
frame.time == "Dec 31, 2002 13:55:31.3 UTC"
This also changes the behavior of "Apply Selected as filter".
Fields using a local time display type will use local time
and fields using UTC display type will be applied using UTC.
Fixes#13268.
Remove the '-d' option from text2pcap, and move the two levels
of debug messages in text2pcap and text_import to either
LOG_LEVEL_DEBUG or LOG_LEVEL_NOISY as appropriate.
text2pcap now has support for fractional sections using the field
descriptor %f and doesn't support the old method, so change the format
string in the test. None of the existing tests depended on the
fractional seconds being correct.
To complete the set of equality operators add an "all equal"
operator that matches a frame if all fields match the condition.
The symbol chosen for "all_eq" is "===".
Repeated words were found with:
egrep "(\b[a-zA-Z]+) +\1\b" . -Ir
and then manually reviewed.
Non-displayed strings (e.g., in comments)
were also corrected, to ease future review.
The case_tshark_name_resolution_clopts test doesn't need live capture,
so switch to a capture file. This should fix the current failure on the
macOS Arm builder.
Use that for error messages, including any using test operators.
This allows to always use the same name as the user. It avoids
cases where the user write "a && b" and the message is "a and b"
is syntactically invalid.
It should also allow us to be more consistent with the use of
double quotes.
Add an UAT for configuring fake headers according to the server port, stream
id and direction of the long-lived stream that we start capturing packets
after it is established. That helps to parsing the DATAs captured subsequently.
A testcase also added.
close#17691
This reverts commit d635ff4933.
A charconst cannot be a value string, for that reason it is not
redundant with unparsed.
Maybe character constants should be parsed in the lexical scanner
instead.
Before:
Filter: ip.proto == '\g'
dftest: "'\g'" cannot be found among the possible values for ip.proto.
After:
Filter: ip.proto == '\g'
dftest: "'\g'" isn't a valid character constant.
For double quoted strings. This is consistent with single quote
character constants and the C standard. It also avoids common
mistakes where the superfluous backslash is silently suppressed.
A charconst uses the same semantic rules as unparsed so just
use the latter to avoid redundancies.
We keep the use of TOKEN_CHARCONST as an optimization to avoid
an unnecessary name resolution (lookup for a registered field with
the same name as the charconst).
Asterix data format is a complex family of asterix categories,
where each individual category exists in multiple editions.
As a result of many variants, the epan/dissectors/packet-asterix.c
is one of the largest dissectors.
So far, the asterix dissector had been maintained manually, where the
generic decoding routines and category/edition specific definitions
were entangled in the same file (packet-asterix.c).
This commit preserves the overall dissector structure, but makes
it easy to update the dissector with new categories or editions as
they become available (via the update script from this commit).
See tools/asterix/README.md file for dissector update procedure.
This commit includes:
- tools/asterix/packet-asterix-template.c
Extraction of generic asterix decoding routines and
common data structures.
- tools/asterix/update-specs.py
Update script, to render the template with up-to-date asterix
specs files. The asterix specs files themselves are maintained in
a separate repository.
- epan/dissectors/packet-asterix.c
Automatically generated dissector for asterix data format.
Although generated, this file needs to remain in the repository,
to be able to build the project in a reproducible way.
The generated asterix dissector was additionally tested with:
- ./tools/check_typed_item_calls.py --mask
- ./tools/fuzz-test.sh
Sync with asterix-specs #cef694825c
Revert to the original design of having a single pattern to catch
everything as unparsed and also try to be less hackish and fragile
parsing "..".
Strings like "80..90" are tricky because it can be parsed as
INTEGER DOTDOT INTEGER or FLOAT FLOAT.
Deprecate the usage of significant whitespace to separate set elements
(or anywhere else for that matter). This will make the implementation
simpler and cleaner and the language more expressive and user-friendly.
Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.
Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.
Fix the '!=' relation to match the definition of equality:
A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.
This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.
The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.
Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.
Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.
Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
- Point all MSP related DATA frames to their MSP instead of
using wmem_tree_lookup32_array_le().
- Add test_grpc_streaming_mode_reassembly testcase for verifying
this feature.
close#17633