Add BASE_SHOW_UTF_8_PRINTABLE and related function tvb_utf_8_isprint
for supporting fields of bytes that are "maybe UTF-8" (default or
SHOULD be UTF-8 but could be something else, with no encoding indicator),
such as SSID fields in IEEE 802.11 (See #16208), certain OctetString
fields in Diameter or PFCP, and other places where
BASE_SHOW_ASCII_PRINTABLE is currently used. Fix#5307
This assert will notify the higher layers that the dissector needs
to be fixed. ieee1722 and zbee-zcl dissectors have been updated to
prevent such a call.
Ref: #17882.
Replace:
g_snprintf() -> snprintf()
g_vsnprintf() -> vsnprintf()
g_strdup_printf() -> ws_strdup_printf()
g_strdup_vprintf() -> ws_strdup_vprintf()
This is more portable, user-friendly and faster on platforms
where GLib does not like the native I/O.
Adjust the format string to use macros from intypes.h.
Move epan_memmem() and epan_strcasestr() to wsutil/str_util.
Rename to ws_memmem() and ws_strcasestr(). Add compile time
check for a system implementation and use that if available.
We invoke those functions using a wrapper to avoid exposing
_GNU_SOURCE outside of the implementation.
Have tvb_get_string_time use iso8601_to_nstime for
ENC_ISO_8601_DATE_TIME (which seems to be the only time in a string
encoding any built in dissector actually uses, in syslog). It is
strictly superior; among other things it handles fractional seconds.
Also, tvbuff.c does not use strptime, so remove that include.
All fragment errors are bounds errors that go past the contained length,
but they do not necessarily involve going past the reported length,
so the checks for FragmentBoundsError should reflect that.
With some forms of reassembly, like IP fragmentation, we don't know how
big the PDU/reassembled packet is until reassembly is complete, so we
probably use tvb_new_subset_remaining() to create fragments and the tvb's
reported length is equal to its contained length. In these cases
ReportedBoundsError would be otherwise thrown, except when the existing
checks for FragmentBoundsError intervene.
However, with other forms of reassembly, like various PDUs carried over TCP,
we know the total PDU length, so we use tvb_new_subset_length[_caplen](),
setting the proper reported length, but not changing the contained
length when reassembly is not performed. In those cases, a bounds error
that occurs due to lack of reassembly is otherwise a ContainedBoundsError,
not a ReportedBoundsError.
In both cases, a bounds error caused by an unreassembled fragment should
be a FragmentBoundsError for the existing reasons. It is not necessarily
a malformed packet (to the extent reassembly is not performed because of a
malformed error elsewhere, that should be reported separately) and can
likely be avoided by changing preferences (e.g., turning reassembly
preferences on, turning off checksum verification, etc.) Otherwise it
is probably a dissector bug.
Implement little endian support for tvb_get_bits family of functions.
The big/little endian refers to bit numbering within an octet. In big
endian, the most significant bit is considered bit 0, while in little
endian the least significant bit is considered bit 0.
Add encoding parameters to proto tree bits format family functions.
Specify ENC_BIG_ENDIAN in all dissectors using these functions except in
USB HID that requires ENC_LITTLE_ENDIAN to work correctly.
When formatting bits values, always display most significant bit on the
leftmost position regardless of the encoding. This results in no gaps
between octets and makes the displayed value comprehensible.
Close#4478Fix#17014
A few of them just needed scratch memory, so allocate and free it
manually after doing any exception-raising checks.
A few others were returning memory, and needed conversion to accept a
wmem scope argument.
It has been added since its length is signed, while the underlying
bytes_to_str uses a size_t, causing an unwanted cast. Basically
passing a len < 0 is pointless.
It is to tvb_reported_length_remaining() as
tvb_ensure_captured_length_remaining() is to
tvb_captured_length_remaining() - it throws an exception if the offset
is out of range.
(Note that an offset that's just past the end of the {reported,
captured} data is *not* out of range, it just means that there is no
data remaining. Anything *past* that is out of range and thus invalid.)
Process the characters entirely ourselves; that way, we don't have to
worry about tvb_get_string_enc(..., ENC_ASCII) mangling label length
values, can convert non-ASCII characters in labels to the Unicode
REPLACEMENT CHARACTER, and can do bounds checks.
Add support internally to using iconv (always present with glib) to convert
strings from various encodings to UTF-8 (using REPLACEMENT CHARACTER as
recommended), and use that to support GB 18030 and EUC-KR. Replace call
directly to iconv in ANSI 637 for EUC-KR to new API. Update comments
and documentation around character encodings. It is possible to replace
the calls to iconv with an internal decoder later. Tested on Linux and
on Windows (including with illegal characters). Closes#16630.
Implement the Unicode Standard "best practices" for replacing ill-formed
sequences with the Unicode REPLACEMENT CHARACTER. Add wmem_strbuf_append_len
for appending strings with embedded null characters. Clarify why
wmem_strbuf_grow() doesn't always ensure that there's enough room for
a new string, and short-circuit some tests there. Related to #14948
Add an encoding for "unpacked" 3GPP TS 23.038 7-bit strings, in which
each code position is in a byte of its own, rather than with the code
positions packed into 7 bits. Rename the packed encoding to explicitly
indicate that it's packed.
Add an encoding for ETSI TS 102 221 Annex A strings.
Use the new encodings.
Change-Id: I2fad824ca417dcd089fabfdf06f28529c7ee9e87
Signed-off-by: Filipe Laíns <lains@archlinux.org>
Reviewed-on: https://code.wireshark.org/review/37949
Petri-Dish: Anders Broman <a.broman58@gmail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Group them by the data types for which they're used, starting with the
byte-order definitions which (with the inclusion of ENC_NA) are used
with all types.
Put all the ones used for strings together, starting with the character
encodings, with the Zigbee flag and the flags for "this is a string but
we're going to interpret it as a byte array or time stamp".
Make ENC_CHARENCODING_MASK equal to ENC_STR_MASK; no, there's no reason
for ENC_STR_MASK to replace ENC_CHARENCODING_MASK - the opposite should
happen, as ENC_CHARENCODING_MASK at least specifies what the bits set in
it are used for, namely character encodings. If all #defines for
strings should have _STR_ in them, start with the character encoings.
Change-Id: I072420f313086153b4ea4034911fc293453dea00
Reviewed-on: https://code.wireshark.org/review/36962
Petri-Dish: Guy Harris <gharris@sonic.net>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <gharris@sonic.net>
Add some ENC_ values for various flavors of packed BCD, and use that
instead of explicitly calling tvb_bcd_dig_to_wmem_packet_str() and
adding the result.
Change-Id: I07511d9d09c9231b610c121cd6ffb3b16fb017a9
Reviewed-on: https://code.wireshark.org/review/36952
Reviewed-by: Guy Harris <gharris@sonic.net>
tvb_find_line_end(), unlike a tvb_find_guint8() looking for an LF,
returns a length that *doesn't* include the line ending, *regardless* of
whether the line ends with CR-LF or just LF, so the query string we
extract is just the query, without any of the line ending.
Update some comments while we're at it to note that the "next_offset"
pointer argument to tvb_find_line_end() and tvb_find_line_end_unquoted()
can be NULL, in which case the offset *past* the line ending isn't
returned. (We pass tvb_find_line_end() NULL in the aforementioned call,
because, in that particular case, we don't care about the next line.)
Change-Id: I1c9746e32c61a79f8cb636d577a2e14a07ecab17
Reviewed-on: https://code.wireshark.org/review/35566
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Add "native" support for the "zig-zag" version of a varint in proto.[ch] and
tvbuff.[ch]. Convert the use of varint in the KAFKA dissector to use the (new)
"native" API.
Ping-Bug: 15988
Change-Id: Ia83569203877df8c780f4f182916ed6327d0ec6c
Reviewed-on: https://code.wireshark.org/review/34386
Petri-Dish: Alexis La Goutte <alexis.lagoutte@gmail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Change all wireshark.org URLs to use https.
Fix some broken links while we're at it.
Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c
Reviewed-on: https://code.wireshark.org/review/34089
Reviewed-by: Guy Harris <guy@alum.mit.edu>
That's what the remaining calls to tvb_get_nstringz() and
tvb_get_nstringz0() are being used to do, even though those routines
were not intended for that purpose - the calls are extracting from a
text protcool, meaning that the strings are *not* null-terminate in the
packet.
Strings - even null-terminated ones - should, in almost all cases, be
extracted by tvb_get_string_enc() or routines that call it, so that an
encoding is specified. In the few cases where we're fetching strings
only to be compared to ASCII constants, or to parse as numbers, we can
get away with this.
Change-Id: I29f0532902c4ade2207de7f06db69c32eafd4132
Reviewed-on: https://code.wireshark.org/review/34072
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
The "Basic code table" in ISO 646 is mostly ASCII, but some code points
either 1) have more than one glyph that can be assigned to them or 2)
have no glyph assigned to them. National versions choose one of the two
glyphs for the code points in group 1) and assign specific glyphs to the
code points in group 2); the International Reference Version assigns the
same glyphs to those code points as does ASCII.
For the "Basic code table" encoding, we map the code points in groups 1)
and 2) to a REPLACEMENT CHARACTER; additional encodings can be added for
the national versions.
Add ENC_ISO_646_IRV (International Reference Version) as an alias for
ENC_ASCII.
Expand some comments, and add some comments, while we're at it.
Change-Id: I4f1b5e426ec193775e919731c5cae1224dc65115
Reviewed-on: https://code.wireshark.org/review/33941
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Clean up some comments while we're at it.
Change-Id: I0cd014bf1d1e7dc740eac1721d5466377938655f
Reviewed-on: https://code.wireshark.org/review/33939
Reviewed-by: Guy Harris <guy@alum.mit.edu>
While we're at it, add the Euro to code page 1251, expand the comments
for 1250 and 1251 and some DOS code pages, and add support for code page
1251 to tvb_get_stringz_enc().
Change-Id: I053d58f87cac26ad7c109e2f1cd8807ffec0622d
Reviewed-on: https://code.wireshark.org/review/33342
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Avoid relying on strptime to parse the day of week (%a) and month name
(%b) since these are locale-dependent. Fixes test suite failures with
tvb.lua and LC_ALL=nl_NL.UTF-8.
Additionally it will now reject four-digit years when using ENC_RFC_822
as that requires two digit years. The only user of this API seems to be
the Lua tests though, so this should not make much of a difference.
Bug: 15437
Change-Id: I75436b93faab23869794d9756b9c3ce6128dd1f4
Reviewed-on: https://code.wireshark.org/review/31698
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
This is intended to be a replacement for get_token_len (from strutil.h) when its used on a tvb. It should be a little safer and remove the need for a dissector to use tvb_get_ptr.
Change-Id: Ib2d4a79718b6fba4eb9acc0129b13be6c8199a43
Reviewed-on: https://code.wireshark.org/review/30892
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
This allows dissectors to check if a portion of the tvb is an ascii string while hiding the use of tvb_get_ptr.
Change-Id: Iaec7559dcfdefb8a5ae23e099ced45e90e611f8f
Reviewed-on: https://code.wireshark.org/review/30291
Petri-Dish: Anders Broman <a.broman58@gmail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
proto_tree_add_item with a zero length argument could end up calling
tvb_get_ptr to retrieve the (empty) backing buffer. This empty tvb was
possibly the result of bad reassembly, but let's gracefully handle it to
avoid a dissector exception.
Call trace for the original exception (only present on the first pass):
proto_report_dissector_bug (format=0x7ffffffecea0 "") at epan/proto.c:1368
ensure_contiguous_no_exception (tvb=0x6060001a5460, offset=0, length=0, pexception=0x7ffffffed060) at epan/tvbuff.c:775
ensure_contiguous (tvb=0x6060001a5460, offset=0, length=0) at epan/tvbuff.c:785
tvb_get_ptr (tvb=0x6060001a5460, offset=0, length=0) at epan/tvbuff.c:906
subset_get_ptr (tvb=0x607000194b90, abs_offset=0, abs_length=0) at epan/tvbuff_subset.c:58
ensure_contiguous_no_exception (tvb=0x607000194b90, offset=0, length=0, pexception=0x7ffffffed3c0) at epan/tvbuff.c:773
ensure_contiguous (tvb=0x607000194b90, offset=0, length=0) at epan/tvbuff.c:785
tvb_get_ptr (tvb=0x607000194b90, offset=0, length=0) at epan/tvbuff.c:906
proto_tree_set_bytes_tvb (fi=0x608000535ca0, tvb=0x607000194b90, offset=0, length=0) at epan/proto.c:3862
proto_tree_new_item (new_fi=0x608000535ca0, tree=0x604000543150, tvb=0x607000194b90, start=0, length=0, encoding=0) at epan/proto.c:2318
proto_tree_add_item_new (tree=0x604000543150, hfinfo=0x7ffff30e91f8, tvb=0x607000194b90, start=0, length=0, encoding=0) at epan/proto.c:3381
proto_tree_add_item (tree=0x604000543150, hfindex=65120, tvb=0x607000194b90, start=0, length=0, encoding=0) at epan/proto.c:3391
dissect_body_data (tree=0x604000543150, pinfo=0x614000000a58, tvb=0x607000194b90, start=0, length=0, encoding=0) at epan/dissectors/packet-http2.c:1974
Change-Id: Icfae83d61ddcc9e26f16eab7f6e0e84e2f0d73ac
Reviewed-on: https://code.wireshark.org/review/29851
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
tvb_generic_clone_offset_len uses tvb_bytes_exist to check that the
requested tvb data is actually available. It did not expect negative
values, that would result in an overly large memory allocation.
Bug: 14678
Change-Id: Ie80095a381e55ca5dbbd5c9d835243549d0b212e
Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7179
Reviewed-on: https://code.wireshark.org/review/27526
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
../epan/tvbuff.c: In function 'tvb_new_octet_aligned':
../epan/tvbuff.c:274:26: error: 'abs_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
*rem_len = tvb->length - *offset_ptr;
^
../epan/tvbuff.c:486:8: note: 'abs_offset' was declared here
guint abs_offset, rem_length;
^
../epan/tvbuff.c: In function 'tvb_find_line_end':
../epan/tvbuff.c:274:26: error: 'abs_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
*rem_len = tvb->length - *offset_ptr;
^
../epan/tvbuff.c:486:8: note: 'abs_offset' was declared here
guint abs_offset, rem_length;
^
../epan/tvbuff.c: In function 'tvb_find_line_end_unquoted':
../epan/tvbuff.c:274:26: error: 'abs_offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
*rem_len = tvb->length - *offset_ptr;
^
../epan/tvbuff.c:486:8: note: 'abs_offset' was declared here
guint abs_offset, rem_length;
Change-Id: Iba9fe31ac5fcf604d65bbf3bceef0c09004c1b6c
Reviewed-on: https://code.wireshark.org/review/27050
Reviewed-by: Anders Broman <a.broman58@gmail.com>
Add a "contained length" to tvbuffs. For non-subset tvbuffs, that's the
same as the reported length. For a subset tvbuff, that's the amount of
the reported data that was actually present in the "contained data" of
the parent tvbuff.
This is unaffected by the *captured* length of any tvbuff; that differs
from the contained length only if the capture was cut short by a
snapshot length.
If a reference is within the reported data, but not within the contained
data, a ContainedBoundsError exception is thrown. This exception
represents a protocol error, rather than a reference past the captured
data in the packet; we treat it as such.
Change-Id: Ide87f81238eaeb89b3093f54a87bf7f715485af5
Reviewed-on: https://code.wireshark.org/review/27039
Reviewed-by: Guy Harris <guy@alum.mit.edu>
Add 8-bit, 16-bit, 24-bit, and 32-bit "fetch signed value" routines, and
use them rather than casting the result of the 8/16/24/32-bit "fetch
unsigned value" routines to a signed type (which, BTW, isn't sufficient
for 24-bit values, so this appears to fix a bug
in epan/dissectors/packet-zbee-zcl.c).
Use numbers rather than sizeof()s in various tvb_get_ routines.
Change-Id: I0e48a57fac9f70fe42de815c3fa915f1592548bd
Reviewed-on: https://code.wireshark.org/review/26844
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>