For an expression starting with a colon (a literal) try to parse
the value with and without colon. This avoids excluding some
valid representations like the IPv6 address "::1".
A literal value is a value that cannot be interpreted as a
registered protocol. An unparsed value can be a literal or
an identifier (protocol/field) according to context and the
current disambiguation rules.
Strictly literal here is to be understood to mean "numeric
literal, including numeric arrays, but not strings or character
constants".
This relaxes the display filter syntax to accept byte arrays without
separators. An expression such as the following becomes valid:
quic.dcid == b1f0b7cbe0897974
Previously it had to be written as:
quic.dcid == b1:f0:b7:cb:e0:89:79:74
Partially fixes#17818.
Require date/time separators when entering a time value, e,g:
2014-07-04 12:34:56.789+00:00
Separators in the timezone offset are an exception, they are
never mandatory.
This excludes ISO basic format to avoid inputs that could
be entirely numbers indistinguishable from Epoch time, in case
we want to support that in the future.
Add the option to enter a filter with an absolute time
value in UTC. Otherwise the value is interpreted in
local time.
The syntax used is an "UTC" suffix, for example:
frame.time == "Dec 31, 2002 13:55:31.3 UTC"
This also changes the behavior of "Apply Selected as filter".
Fields using a local time display type will use local time
and fields using UTC display type will be applied using UTC.
Fixes#13268.
This makes it easier to understand the code, avoids conflicts
and ugly and unnecessary casts.
The field display enum has evolved over time from integer types
to a type generic parameter.
Encapsulate the feature requirements for strptime() in a
portability wrapper.
Use _GNU_SOURCE to expose strptime. It should be enough on glibc
without the side-effect of selecting a particular SUS version,
which we don't need and might hide other definitions.
Split ws_regex_matches() into two functions with better semantics
and remove the WS_REGEX_ZERO_TERMINATED symbol.
ws_regex_matches() matches zero terminated strings.
ws_regex_matches_length() matches a string length in code units.
Replace:
g_snprintf() -> snprintf()
g_vsnprintf() -> vsnprintf()
g_strdup_printf() -> ws_strdup_printf()
g_strdup_vprintf() -> ws_strdup_vprintf()
This is more portable, user-friendly and faster on platforms
where GLib does not like the native I/O.
Adjust the format string to use macros from intypes.h.
The type ssize_t is not available on Windows. Because this is
used in the public API we must provide a definition for it.
To avoid having to add a header to fix this use a size_t in
the API instead, and assign SIZE_MAX to represent a null
terminated string.
Move epan_memmem() and epan_strcasestr() to wsutil/str_util.
Rename to ws_memmem() and ws_strcasestr(). Add compile time
check for a system implementation and use that if available.
We invoke those functions using a wrapper to avoid exposing
_GNU_SOURCE outside of the implementation.
Invalid character constants should be handled in the lexical scanner.
Todo: See if some code could be shared to parse double quoted strings.
It also fixes some unintuitive type coercions to string. Character
constants should be treated as characters, or maybe integers, or
maybe even throw an invalid comparison error, but coverting to a
literal string or byte array is surprising and not particularly
useful:
'\xFF' -> "'\xFF'" (equals)
'\xFF' -> "FF" (contains)
Before:
Filter: http.request.method contains "\x63"
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method contains '\x63'
Constants:
00000 PUT_FVALUE "63" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == "\x63"
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == '\x63'
Constants:
00000 PUT_FVALUE "'\\x63'" <FT_STRING> -> reg#1
(...)
After:
Filter: http.request.method contains '\x63'
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
Filter: http.request.method == '\x63'
Constants:
00000 PUT_FVALUE "c" <FT_STRING> -> reg#1
(...)
This reverts commit d635ff4933.
A charconst cannot be a value string, for that reason it is not
redundant with unparsed.
Maybe character constants should be parsed in the lexical scanner
instead.
Before:
Filter: ip.proto == '\g'
dftest: "'\g'" cannot be found among the possible values for ip.proto.
After:
Filter: ip.proto == '\g'
dftest: "'\g'" isn't a valid character constant.
For double quoted strings. This is consistent with single quote
character constants and the C standard. It also avoids common
mistakes where the superfluous backslash is silently suppressed.
PCRE2 is mature, widely used and widely available. Supporting two
different RE implementations, one of which is unmaintained, is
unnecessary and counter-productive.
PCRE2 is the future of PCRE. The only advantage of GRegex is that
it comes bundled with GLib, which is not an advantage at all.
PCRE2 is widely available, the GRegex abstractions layer are not a
good fit and abstract things that don't need abstracting or that we
could handle better ourselves, there are open bugs (#12997) and
maintenance is spotty at best.
GRegex comes with many of the problems of bundled code, aggravated by
the fact that it completely falls outside of our control.
The header ftypes-int.h should not be used outside of epan/ftypes
because it is a private header.
The functions fvalue_free() and fvalue_cleanup() need not and should
not be macros either.
The implementation is pre-computing the length and using that
to allocate a buffer. This doesn't have any practical advantage
and is inefficient because the code is mostly doing the same work
twice. Remove the unnecessary length pre-computation step.
If an ftype can participate in equala assume it can also participate in
not equals. Use fvalue_can_eq() instead of fvalue_can_ne().
If it can participate in one order comparison it can participate in all.
Replace any comparison with fvalue_can_cmp().
A charconst uses the same semantic rules as unparsed so just
use the latter to avoid redundancies.
We keep the use of TOKEN_CHARCONST as an optimization to avoid
an unnecessary name resolution (lookup for a registered field with
the same name as the charconst).
It won't work with embedded null bytes so don't try. This is
not an additional restriction, it just removes a hidden failure
mode. To support matching embedded NUL bytes we would have
to use an internal string representation other than
null-terminated C strings (which doesn't seem very onerous with
GString).
Before:
Filter: http.user_agent == 41:42:00:43
Constants:
00000 PUT_FVALUE "AB" <FT_STRING> -> reg#1
Instructions:
00000 READ_TREE http.user_agent -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
After:
Filter: http.user_agent == 41:42:00:43
Constants:
00000 PUT_FVALUE "41:42:00:43" <FT_STRING> -> reg#1
Instructions:
00000 READ_TREE http.user_agent -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
FT_PROTOCOL and FT_BYTES are the same semantic type, but one is
backed by a GByteArray and the other by a TVBuff. Use the same
semantic rules to parse both. In particular unparsed strings
are not converted to literal strings for protocols.
Before:
Filter: frame contains 0x0000
Constants:
00000 PUT_FVALUE 30:78:30:30:30:30 <FT_PROTOCOL> -> reg#1
Instructions:
00000 READ_TREE frame -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_CONTAINS reg#0 contains reg#1
00003 RETURN
Filter: frame[5:] contains 0x0000
dftest: "0x0000" is not a valid byte string.
After:
Filter: frame contains 0x0000
dftest: "0x0000" is not a valid byte string.
Filter: frame[5:] contains 0x0000
dftest: "0x0000" is not a valid byte string.
Related to #17634.
For efficiency do the comparison in a single function call
instead of trying to preserving exactly the previous semantics.
Still I tried not to deviate much.
All the order operators can be defined in terms of 'lt'
and 'eq' so use that to reduce the number of required
methods from 6 to 2.
Further reduce to one by combining those two into a single
function that has memcmp semantics: negative return is
"less than", positive is "greater than" and zero is equal.
Octal escape sequences \NNN can have between 1 and 3 digits. If
the sequence had less than 3 digits the parser got out of sync
with an incorrect double increment of the pointer and errors out
parsing sequences like \0, \2 or \33.
Before:
Filter: ip.proto == '\33'
dftest: "'\33'" is too long to be a valid character constant.
After:
Filter: ip.proto == '\33'
Constants:
00000 PUT_FVALUE 27 <FT_UINT8> -> reg#1
Instructions:
00000 READ_TREE ip.proto -> reg#0
00001 IF-FALSE-GOTO 3
00002 ANY_EQ reg#0 == reg#1
00003 RETURN
Fixes#16525.