Since fvalue_to_string_repr does take the field base
as a parameter and that affects the representation,
an existing comment is no longer true, and we can
get rid of a large amount of duplicative special
handling for integer-based types.
Properly generate filter expressions for custom columns by
using proto_construct_match_selected_string on each value and
then joining them together later instead of trying to split
the column expression value.
This ensures that escaping is done properly for display filter
strings, that commas internal to field values are not confused
with commas between occurrences, that for multifield columns
we can distinguish which field each value matches, etc.
It's not entirely clear whether AND or OR logic is appropriate
for multiple occurrences; currently OR is used.
Bump glib requirement to 2.54 for g_ptr_array_find_with_equal_func
(this doesn't drop support for any major distribution that already
meets our other library requirements, like Qt.)
Fix#18001.
The ftype itself is encoding agnostic. In the case of literal
display filter strings it is possible and legal to contain
invalid UTF-8.
Maybe it shouldn't be but that requires a user-friendly diagnostic
message, not silently sanitizing the string as is done currently
(only a debug message is printed in that case).
Do the debug checks in proto_tree_set_string() instead. That
still detects dissector code that might need fixing, which was
the purpose for this check.
Improve documentation and add admonition for proto_tree_add_string().
Ping #18521.
The proto.h APIs expect valid UTF-8 so replace uses of format_text()
with a label copy function that just does formatting and does not
check for encoding errors. Avoid multiple levels of temporary
string allocations.
Make sure the copy does not truncate a multibyte character and
produce invalid strings. Add debug checks for UTF-8 encoding errors
instead.
We escape C0 and C1 control codes (because control codes)
and ASCII whitespace (and bell).
Overall the goal is to be more efficient and optimized and help
detect misuse of APIs by passing invalid UTF-8.
Add a unit test for ws_label_strcat.
This avoids having general-purpose decoding happening in
non-DLL-exported functions defined in a dissector for #18478,
and removes unused functions and avoids duplicate decoding.
This also removes unnecessary early exit conditions for #18145.
Unit test cases for varint decoding are added to verify this.
This field display type formats the representation string of
FT_STRING by replacing all space character with ' '.
Instead of "A line end\n" it will output "A line end ".
This allows cleaner code using proto_tree_add_item() and
avoids the problematic pattern
proto_tree_add_string(..., tvb_format_text_wsp(...));
because we only want to affect the way the string value is displayed,
not the actual field value stored.
Some older dissectors that predate Unicode and parse text protocols
are prone to generate invalid UTF-8 strings. This is a bug and can have
safety implications.
For example passing invalid UTF-8 to proto_tree_add_string() is a
common bug. There are safeguards in format_text() but this should
not be relied on as a general solution to the problem.
For one, as the name implies, it is only used with representation of a
field value, which is not the same as the value itself of an FT_STRING field.
Issue #18317 shows another reason why.
For now this compile flag only enables extra checks for string ftypes,
which covers a subset of proto.h APIs including
proto_tree_append_string(). Later is should be extended to other
interfaces.
This is also not expected to be disabled for release builds because
there are still many dissectors that do not correctly handle strings.
More work is needed to 1) identify them and 2) fix them.
Ping #18317
If UTF-8 validation fails, set the fvalue to a sanitized value so that calls
later to retrieve it don't null deference and crash. We could,
especially for a release, disable the assertion and just sanitize
bad strings.
Related to #18363
When a dissector directly adds a string value through
proto_tree_add_string[_format_value], validate that it is
UTF-8 so that only valid UTF-8 strings are used internally,
and written to output (whether text, JSON, or XML.)
(We were treating it as a UTF-8 string anyway, but not
validating it.)
If the string passed in is not UTF-8, that's a dissector bug
Dissectors that use API functions like tvb_get_string_enc
will always produce valid UTF-8, but some do their own
processing.
Fix#18317
The proto_item_XXX_text() routines and proto_tree_add_XXX_format[_value]
functions allow dissectors to alter the representation string for
a protocol tree item with data that may come from arbitrary packet data.
These values are displayed by tshark or wireshark, so they should made
into printable, valid UTF-8.
This means that dissectors no longer need to call format_text before using
those functions (though, if they want to produce some other kind of
printable string, such as with format_text_wsp, they still can.)
Also, mark when appending and prepending text truncates a string that
was not previously truncated (except for a small number of cases where
it is difficult to determine if it was truncated before.)
Part of #18317
It is correct to pass in the memory address immediately past
the end of our buffer, as g_utf8_prev_char() does not deference
it until after decrementing it once, and we want to find the final
UTF-8 character start. Starting one byte earlier truncates the string
more than necessary.
This effectively reverts 4b6224a673
which noted that Coverity flagged this as a memory access error,
although it is not. This is possibly because it was written as
&label_str[ITEM_LABEL_LENGTH]. All versions of the ISO C standard
starting with C99 have indicated (6.5.3.2) than in such a case
"neither the & operator nor the unary * that is implied by the [] is
evaluated and the result is as if the & operator were removed and the
[] operator were changed to a + operator" and (6.5.6) that referring
to the memory address one past the last element of an array object
"shall not produce an overflow" and is not undefined (so long as it
not deferenced.)
However, Coverity may not have been aware of this, so rewrite
the expression using the + operator in the hopes of avoiding
false positive Coverity errors.
Instead of a function call, instantiate the PROTO_REGISTRAR_GET_NTH
macro directly, which contains the subsequent DISSECTOR_ASSERT macro
to test the result anyway.
Remove the redundant BASE_FLOAT field display type. The name
BASE_FLOAT is meaningless and the value aliased to BASE_NONE.
Require BASE_NONE instead of BASE_FLOAT (corresponding to
the printf() %g format).
Add new float display types using BASE_DEC, BASE_HEX and BASE_EXP
corresponfing to %f, %a and %e respectively.
Add support for BASE_CUSTOM with floats.
A "proto_tree_add..._ret_..." routine *must* return the value through
the pointer, even if no protocol tree is being built, as there's no
guarantee that a protocol tree will be built under all circumstances
(for example, if the dissection is only being done to generate the
column values, no column is a custom column, there are no coloring
rules, etc., so that none of the named field values are of interest, and
the protocol tree isn't going to be displayed, no protocol tree will be
built).
Fixes#18203.
It's possible for a dissector to claim a frame without adding to
the tree or being added to frame.protocols (see !6669)
Log a debug message showing the pinfo layers and the dissector that
claimed the tvb (frame/packet).
When writing a custom column, some field types can't have a resolved
value, and just copy the label from the expression to the value.
Only copy information from the most recent field when doing so,
so that with multifield custom columns the entire unresolved value
doesn't get overwritten with the resolved value (if some fields
have resolved values and some don't.) This also reduces copying
from O(N^2) to O(N).
Fixes the display "unresolved" value for multifield custom columns
that are a mix of field types.
proto_strlcpy in normal situations returns the number of bytes
copied (because the return value of g_strlcpy is strlen of the
source buffer). It can copy no more than dest_size - 1, because
dest_size is the size of the buffer, including the null terminator.
(https://docs.gtk.org/glib/func.strlcpy.html)
Returning dest_size can cause offsets to get off by one and reach
the end of the buffer, and can cause subsequent calls to have
buffer overflows. (See #16905 for an example in the comments.)
This removes unparsed name resolution during the semantic
check because it feels like a hack to work around limitations
in the language syntax, that should be solved at the lexical
level instead.
We were interpreting unparsed differently on the LHS and RHS.
Now an unparsed value is always a field if it matches a
registered field name (this matches the implementation in 3.6
and before).
This requires tightening a bit the allowed filter names for
protocols to avoid some common and potentially weird conflicting
cases.
Incidentally this extends set grammar to accept all entities.
That is experimental and may be reverted in the future.
Field infos have a length property that was not stored with the
field value so when using a negative index the end was computed
from the captured length of the frame tvbuff, leading to incorrect
results. The documentation in wireshark-filter(5) describes how
this was supposed to work but as far as I can tell it never worked
properly.
We now store the length and use that (when it is different from -1)
to locate the end of the protocol data in the tvbuff. An extra wrinkle
is that sometimes the length is set after the field value is created.
This is the most common case as the majority of protocols have a
variable length and dissection generally proceeds with a TVB subset from
the current layer (with offset zero) through all remaining layers to the
end of the captured length. For that reason we must use an expedient to allow
changing the protocol length of an existing protocol fvalue, whenever
proto_item_set_len() is called.
Fixes#17772.
Respect BASE_SPECIAL_VALS when adding to the title item of an
item added with the proto_tree_add_bitmask* functions.
Note that the documentation for the BMT_NO_INT flag has always
said that "only boolean flags are added to the title" and that
no integer based items are added, but the actual behavior has been
to add integer items with custom format functions and value strings.
When integer fields are displayed in the bitmask header item in
proto_tree_add_bitmask_tree and hf->strings is set, only the string
from the value_string is used, not the integer value, to save space.
However, that means that BASE_UNIT_STRING fields have to be treated
differently from all the other fields with hf->strings set. If not,
then only the units are displayed instead of the number with the units.
Fields based on 32 bit integers were already being handled correctly.
Use that same logic for fields based on 64 bit integers.
(See commit 24d991dab4 for something similar.)
In proto_item_add_bitmask_tree, on the signed integer path, the
test for if the display uses a unit string is clearly reversed,
calling it only if BASE_UNIT_STRING is unset. Use the correct
test from the unsigned integer path.
Add support for BASE_SPECIAL_VALS to fill_label_bitfield[64], for
fields with a nonzero bitmask, using the same logic as
fill_label_number[64].
There's at least one dissector (packet-ipmi-se.c) that was trying
to use this already, but silently had no effect.
Packet info already contains the notion of layer depth for the
current protocol, among all the protocols in the frame. This
adds an extra layer number for the protocols that are the same
as the current one. Obviously this will only go above one if
the protocol is repeated in the stack, such as with IP tunneling.
Adds extra logic to track numbers for each protocol in the frame
and update them when calling a dissector.
The total layer number and protocol layer number are store in
the field info structure so they can be used after dissection,
namely by display filters.
If we're faking items, then proto_[item|tree]_get_parent[_nth] return
the parent of the faked item, which may not be what we want. We have
no way of knowing if the logical item meant was the faked item itself
or one of its children that share the same proto_item when faked.
Thus we don't know if we should return the proto_item itself or its
parent when called on a possibly faked item. Most of the time we will
be adding new items to what we return here, which means not faking items
that could be faked (since we might be returning the root node, which
doesn't have a field_info), hurting performance (see #8069).
It can also have some unusual effects on the protocol hierarchy stats,
particularly if we change things so that non-visible items can change
their length, which has a similar issue. (#17877)
Needed to format timestamp in #18038 - packet-cql.c
Mirrors changes made in !1924 - Add ENC_TIME_NSECS timestamp encoding
Documentation in README.dissector, proto.c, proto.h - could use
refresh in a different merge request.
This replaces the current macro reference system with
a completely different implementation. Instead of a macro a reference
is a syntax element. A reference is a constant that can be filled
in the dfilter code after compilation from an existing protocol tree.
It is best understood as a field value that can be read from a fixed
tree that is not the frame being filtered. Usually this fixed tree
is the currently selected frame when the filter is applied. This
allows comparing fields in the filtered frame with fields in the
selected frame.
Because the field reference syntax uses the same sigil notation
as a macro we have to use a heuristic to distinguish them:
if the name has a dot it is a field reference, otherwise
it is a macro name.
The reference is synctatically validated at compile time.
There are two main advantages to this implementation (and a couple of
minor ones):
The protocol tree for each selected frame is only walked if we have a
display filter and if the display filter uses references. Also only the
actual reference values are copied, intead of loading the entire tree
into a hash table (in textual form even).
The other advantage is that the reference is tested like a protocol
field against all the values in the selected frame (if there is more
than one).
Currently the reference fields are not "primed" during dissection, so
the entire tree is walked to find a particular reference (this is
similar to the previous implementation).
If the display filter contains a valid reference and the reference is
not loaded at the time the filter is run the result is the same as a
non existing field for a regular READ_TREE instruction.
Fixes#17599.
This adds a _ws.ftypes namespace with protocol fields with all
the existing field types.
Currently this is only useful to debug the display filter compiler,
without having to find a real protocol field with the desired type.
Later it may find other uses.
NTP Era 1 begins on 7 February 2036, 06:28:16 UTC, exactly when
the 64 bit fixed point timestamp rolls over. See RFC 4330/5905 (and
the correct comments later in get_time_value). Fix the comment where
the constant is defined (the value is already correct, however.)
Return NULL when an item with length zero or lower -1 is added to
the tree.
With this the calling dissector doesn't have to check the length and
there is no Dissector bug reported.
Related to #17890
Add comments to indicate what types of display information various field
types are allowed.
Make the error messages for fields that only allow some particular
display information types specific to those types, rather than saying
"no field information allowed". This also gets rid of some
fallthroughs, one of which allows BASE_PROTOCOL_INFO for floating-point
types, which makes no sense.
Add BASE_SHOW_UTF_8_PRINTABLE and related function tvb_utf_8_isprint
for supporting fields of bytes that are "maybe UTF-8" (default or
SHOULD be UTF-8 but could be something else, with no encoding indicator),
such as SSID fields in IEEE 802.11 (See #16208), certain OctetString
fields in Diameter or PFCP, and other places where
BASE_SHOW_ASCII_PRINTABLE is currently used. Fix#5307
Report all registration errors with REPORT_DISSECTOR_BUG().
In the workers for register_all_protocols() and
register_all_protocol_handlers(), use TRY/CATCH/ENDTRY to catch
DissectorError exceptions thrown by REPORT_DISSECTOR_BUG() when
registering dissectors. Return the error message from the main thread
routine and, when joining the worker thread, if there's an error message
returned, throw it in the current thread, so that it gets caught by the
main libwireshark initialization code.
Fixes the crash in #17856.
This makes it easier to understand the code, avoids conflicts
and ugly and unnecessary casts.
The field display enum has evolved over time from integer types
to a type generic parameter.