GitLab CI builds RPMs in a different directory for each pipeline
($CI_PROJECT_DIR/build/packaging/rpm/BUILD/wireshark-<version>), so set
base_dir to the build directory and enable absolute_paths_in_stderr.
Fix our cache directory max sizes as well.
The proto.h APIs expect valid UTF-8 so replace uses of format_text()
with a label copy function that just does formatting and does not
check for encoding errors. Avoid multiple levels of temporary
string allocations.
Make sure the copy does not truncate a multibyte character and
produce invalid strings. Add debug checks for UTF-8 encoding errors
instead.
We escape C0 and C1 control codes (because control codes)
and ASCII whitespace (and bell).
Overall the goal is to be more efficient and optimized and help
detect misuse of APIs by passing invalid UTF-8.
Add a unit test for ws_label_strcat.
Use the setup_frame_number to look for and create conversations
with srtcp_add_address, the same way as done in srtp_add_address.
This ensures that RTP and RTCP find the same conversation when
called back to back (as when handling them multiplexed on the
same conversation.
Related to #18460.
As far as I can tell, get_unicode_or_ascii_string() always
nul-terminates string (as it should), so remove g_strlcpy()
copy that can truncate string and produce invalid UTF-8.
This avoids having general-purpose decoding happening in
non-DLL-exported functions defined in a dissector for #18478,
and removes unused functions and avoids duplicate decoding.
This also removes unnecessary early exit conditions for #18145.
Unit test cases for varint decoding are added to verify this.
If a character is not a valid Unicode codepoint, i.e. one of
the code points reserved for surrogate pairs or a code point
above 0x10FFFF, don't add it to a wmem_strbuf when converting
from other encodings but add a replacement character instead, by
using a new wmem_strbuf_append_unichar_validated() function.
Now we produce valid UTF-8 in various situations where UCS-2 or UTF-32
can encode unpaired surrogate codepoints. Consolidate some related
checks that are now redundant.
Also add a replacement character to the end of invalid UCS-2 strings
with an odd number of bytes, as done with UTF-16 and UTF-32.
Fix#18508
QFontMetrics::leading() was zero for Consolas on Windows in Qt5, but is
nonzero in Qt6. This revealed that we were inconsistently using height()
and leading() to calculate our line height. Just use lineSpacing()
instead.
Fixes#18438.
Rename tvb_get_nstringz0() to tvb_get_raw_bytes_as_stringz()
to reflect the fact that this function does not return
a string (UTF-8 internal text string).
Remove tvb_get_stringz() because it is unused and just seems
dangerous.
In PacketCable MTA capabilities, the length of the capability
is store as hex digits in ASCII. If bogus, the incorrect value
is added as an expert info. Ensure that it's formatted as UTF-8
and for display when added to the tree.
Fix#18437
3GPP came up with a special encoding of TDMA frame number, which reduces
the amount of bits needed to carry it from 32 to 16. This encoding is
not only employed on the radio interface (GSM RR), but also on the
A-bis/RSL interface which is used between BTS and BSC nodes.
From the user perspective, parsed RFN value is a lot more meaningful
than the T1/T2/T3 variables used on the wire. The GSM RR dissector
does show parsed RFN value together with these variables, while the RSL
dissector does not. Let's show it in the RSL dissector too.
In SMB2, the length of the buffer than contained a UTF-16
unicode string is not necessarily the length of the converted
UTF-8 string, and in some cases can even be shorter than the
length of the UTF-8 string, if the string has many 2 octet
UTF-16 characters that are 3 or 4 octets in UTF-8.
Use wmem_strdup and wmem_strdup_printf instead of wmem_alloc
and sprintf, which is a safer pattern anyway as it reduces
the chance of these errors.
Fix#18482
By using strlcat later, we don't need to update pname_ret again,
since we only need the total size of the buffer. Elminates a
clang analyzer warning about writing a value that is never used
related to commit 9891a79137
Add UNICODE_REPLACEMENT_CHARACTER as a #define for the Unicode
REPLACEMENT CHARACTER code point (0x00FFFD), and use that instead of
0xfffd/0xFFFD/0x00FFFD in cases where that value refers to REPLACEMENT
CHARACTER.
Ensure that FTP doesn't add invalid strings to the tree or columns.
Also allow UTF-8 pathnames to work.
According to RFC 2640, FTP supports UTF-8 for pathnames (and it
MUST be supported even if the other side does not advertise support
for UTF-8, unless a different character set has been explicitly
configured, which is out of scope of the RFCs, and we don't have
such a preference.) So in general interpret strings as UTF-8, not
ASCII.
Reduce the use of tvb_get_ptr by using functions directly on the
original tvb and offset. This also happens to be more compliant
with RFC 2640 when getting the token lengths. (RFC 2640 states
that implementations MUST assume that there is only one space between
a command and the pathname, and treat additional spaces as part of
the pathname instead of skipping them. tvb_get_token_len() does not
skip trailing spaces, but get_token_len() does.)
The only place that still uses tvb_get_ptr is when processing a PWD
command, because it has to deal with the double quote escaping as
a custom encoding.
Add a tvb_ascii_isdigit function.
Fix#18439.