Commit Graph

138 Commits

Author SHA1 Message Date
John Thacker 4c3ebe73d3 epan: ensure that the result of ws_label_strcpy is terminated
Unless there is no available space, ensure that the label_str
passed into ws_label_strcpy is null terminated, in the cases
where the string to copy is the empty string, or begins with
invalid UTF-8.

Fix #18560. Fix #18551.
2022-10-27 18:37:17 -04:00
João Valverde c1cede8d7c epan: Format column string input for display.
Format the input for display, by escaping some non printable characters,
using ws_label_strcpy().

In some cases with vsnprintf() this requires using a temporary buffer.

Add some debug checks for invalid UTF-8 errors.

The intention here is to pass dissection data directly to the column
API, and the column functions are responsible for formatting that
data for display. This avoids having to call format_text() before
adding a string to a column and separates the concerns better.
Display formatting is an UI concern.
2022-10-26 13:28:19 +01:00
João Valverde 92e1357bb4 Rename ws_label_strcat() to ws_label_strcpy()
The semantics of ws_label_strcat() are closer to g_strlcpy() so
rename the function to reflect that.
2022-10-26 13:12:35 +01:00
João Valverde f55cb116a0 Remove memset() from ws_label_str()
In the interests of efficiency with multiple small writes avoid
doing a memset on the whole remaining length.
2022-10-26 13:12:31 +01:00
John Thacker d7c993d4af epan: Fix the end offsets for hex string items
hex_str_to_bytes_encoding() consumes pairs of hex digits (and
optional separator) to turn into bytes. It can return a pointer
to the character after the last digit consumed. Don't advance
the end pointer after a single unpaired digit that is not consumed
as part of the hex string returned.

tvb_get_string_bytes() can pass back the end offset. If conversion
fails, return the initial offset instead of zero to make repeated
calls easier in cases where the full length is not decoded due to
errors.

Relatedly, no dissector currently uses this return value, because
it's not useful currently.
2022-10-21 01:11:53 +00:00
João Valverde 603354203b epan/proto: Replace format text()
The proto.h APIs expect valid UTF-8 so replace uses of format_text()
with a label copy function that just does formatting and does not
check for encoding errors. Avoid multiple levels of temporary
string allocations.

Make sure the copy does not truncate a multibyte character and
produce invalid strings. Add debug checks for UTF-8 encoding errors
instead.

We escape C0 and C1 control codes (because control codes)
and ASCII whitespace (and bell).

Overall the goal is to be more efficient and optimized and help
detect misuse of APIs by passing invalid UTF-8.

Add a unit test for ws_label_strcat.
2022-10-20 20:05:15 +01:00
João Valverde 15634c0b46 Move format_text() to libwsutil and add unit tests 2022-09-28 21:44:27 +00:00
João Valverde 9345bcdae5 epan: Change signature of format_text()
Replace "const guchar *" with "const char *".
2022-09-28 19:28:28 +01:00
João Valverde 32befe119d Add a log domain for encoding errors and lower the log level
Using a warning is probably too exalted for the current state
of the code, where UTF-8 errors are somewhat expected from
dissectors that are lax about input validation.

Use a debug level with its own "UTF-8" domain instead.

Using a dedicated domain allows to filter on encoding errors and
with some enhancements to the logging subsystem make them fatal
for tracking and debugging purposes.

Using a dedicated domain might have other drawbacks but for now
it seems like the best approach.
2022-09-28 14:57:51 +01:00
João Valverde 621257f472 epan: Add a warning for invalid UTF-8 with format_text() 2022-09-27 17:04:44 +00:00
John Thacker 73d8bb1bc3 XML: Do escape ASCII control characters
XML 1.0 allows valid UTF-8 characters, except for the ASCII control
characters other than tab, carriage return, and line feed.
(It does not allow form feed and vertical tab, so the allowed group is
not the same as the standard ctype.h isspace category. It also
allows but discourages DEL (\x7F).)

The characters cannot be included as character references of the
form &#xx; either; there is technically no way to include them.
Escape them as done prior to 89e96c1e77
but continue to leave bytes with the high bit set alone so that
UTF-8 printable characters are not escaped.

Fix #10445
2022-09-21 23:46:35 +00:00
John Thacker 95b45b2555 Qt: Add percent-encoding to Show Packet Bytes
Add Percent-encoding to the list of encoding types that Show
Packet Bytes can handle.

There's a function added to glib 2.66 to handle this for arbitrary
bytes that might have internal nulls (and which allows the result
to be non UTF-8), but we don't require that version yet, so extend
the existing function.

Related to #1084
2022-09-03 17:25:28 +00:00
Moshe Kaplan 69d54d6f8e Corrects repeated words throughout the code.
Repeated words were found with:
egrep "(\b[a-zA-Z]+) +\1\b" . -Ir
and then manually reviewed.
Non-displayed strings (e.g., in comments)
were also corrected, to ease future review.
2021-12-22 11:01:11 +00:00
João Valverde 76186f16fb epan: Rewrite format_text_chr() using standard APIs 2021-12-03 10:18:37 +00:00
João Valverde a9c36dfb75 epan: Remove unused format_uri() function
Used with the GTK GUI, not used for a long time.
2021-11-30 22:07:09 +00:00
João Valverde 1e0cc18ae8 epan: Remove duplication in format_text_wsp()
This function and format_text() are very similar so use a common
implementation for both.
2021-11-30 21:34:57 +00:00
João Valverde c18e44f563 epan: Fix UTF-8 bitmask for 2-byte codepoint
Fixes format_text_wsp(), use the correct bitmask from format_text().
2021-11-30 21:34:57 +00:00
João Valverde 13783fae6b Add comment with rationale for having format_text_chr().
From fa1027a004.
2021-11-30 19:04:37 +00:00
João Valverde 37f2a86207 Move string_or_null() to wsutil 2021-11-29 18:37:03 +00:00
João Valverde dcbd79584d epan/str_util: Remove unused functions
Remove ws_strdup_escape_char(). I don't think it is generic enough to keep,
and it does not seem very efficient either.

Remove string_replace(). This function was used in the GTK GUI.
2021-11-29 18:37:03 +00:00
João Valverde 44121e2c3b Move escape_string() to wsutil
Move this utility function to wsutil. Rename to
ws_escape_string().

Also add tests.
2021-11-29 17:47:53 +00:00
João Valverde ef8125e3ae Move two functions from epan to wsutil/str_util
Move epan_memmem() and epan_strcasestr() to wsutil/str_util.
Rename to ws_memmem() and ws_strcasestr(). Add compile time
check for a system implementation and use that if available.

We invoke those functions using a wrapper to avoid exposing
_GNU_SOURCE outside of the implementation.
2021-11-28 12:32:51 +00:00
João Valverde 1fc621e38d epan: Fix crash with upper-case protocol filter names
Registering a preference module for a protocol filter name with
upper case letters aborts the program. Relax this restriction to
conform with the rules for protocols. The recommendation is still
to use all lower-case letters.

Fixes 070aeddf76.
2021-11-04 16:29:34 +00:00
João Valverde 925e01b23f Remove duplicate format_size() function
We have two format_size()s, with and without wmem scoped memory.
Move the wmem version to wsutil and add a convenience macro to
use g_malloc()ed memory.
2021-07-26 14:56:11 +00:00
John Thacker 770746cca8 epan: Fix format_text treament of Greek, Arabic, etc.
format_text uses the wrong bitmask when checking for two byte UTF-8
characters, resulting in rejecting half the possible two bytes characters,
including all of Arabic and Greek, and substituting REPLACEMENT CHARACTER
for them. Fixes #17070, and add some comments about the current behavior
that doesn't match existing comments.
2020-12-09 12:51:19 +00:00
Guy Harris 35418a73f7 Add format_text_string(), which gets the length with strlen().
format_text(alloc, string, strlen(string)) is a common idiom; provide
format_text_string(), which does the strlen(string) for you.  (Any
string used in a %s to set the text of a protocol tree item, if it was
directly extracted from the packet, should be run through a format_text
routine, to ensure that it's valid UTF-8 and that control characters are
handled correctly.)

Update comments while we're at it.

Change-Id: Ia8549efa1c96510ffce97178ed4ff7be4b02eb6e
Reviewed-on: https://code.wireshark.org/review/38202
Petri-Dish: Guy Harris <gharris@sonic.net>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <gharris@sonic.net>
2020-08-20 07:24:32 +00:00
Michael Mann f509a83381 Add format_size_wmem
It's a "wmem version" of format_size (from wsutil/str_util.h).

Also improved the flexibility in formatting of format_size() to handle future
needs of format_size_wmem

Ping-Bug: 15360
Change-Id: Id9977bbd7ec29375bbac955f685d46e75b0cef2c
Reviewed-on: https://code.wireshark.org/review/31233
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2019-12-02 05:01:16 +00:00
Guy Harris 20800366dd HTTPS (almost) everywhere.
Change all wireshark.org URLs to use https.

Fix some broken links while we're at it.

Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c
Reviewed-on: https://code.wireshark.org/review/34089
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-07-26 18:44:40 +00:00
Guy Harris edd5eaa57e Don't format printable non-ASCII Unicode characters as escape sequences.
Note that even strings fetched with ENC_ASCII may contain them - bytes
with the 8th bit set get mapped to REPLACEMENT CHARACTER.

This means we can format STR_UNICODE fields with format_text(); do so.

Bug: 1372
Change-Id: Ia32c3a92d220ac5174ecd25f33e2d1f85cfb8cb8
Reviewed-on: https://code.wireshark.org/review/34080
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-07-25 14:50:40 +00:00
Guy Harris a409987eea Fix format_uri().
It was using the same index into the input and output strings, which
means that if it escaped any character, it would skip the next two
characters in the input sring.

It was also not clearing is_reserved before testing whether a character
was reserved, so once it saw a character that neede dto be escaped, it
would escape all subsequent characters.

It was only used in get_key_string(), which was never used, so it was
dead code, but let's at least fix it, even if we end up removing that
code, so that if we bring it back, we bring back a non-broken version,
and so that if anybody *else* uses it, it's not broken.

Change-Id: I36588efad36908e012023bcfbd813c749a6a254f
Reviewed-on: https://code.wireshark.org/review/33287
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2019-05-21 08:30:12 +00:00
Dario Lombardo 9aa63d2406 epan: remove return from functions returning void.
Found by clang-tidy.

Change-Id: Ibedfec5e5d3eca7c2e65319b7ecb4dcbe974b88b
Reviewed-on: https://code.wireshark.org/review/31337
Petri-Dish: Dario Lombardo <lomato@gmail.com>
Petri-Dish: Guy Harris <guy@alum.mit.edu>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2019-01-04 05:07:58 +00:00
Dario Lombardo 55c68ee69c epan: use SPDX indentifiers.
Skipping dissectors dir for now.

Change-Id: I717b66bfbc7cc81b83f8c2cbc011fcad643796aa
Reviewed-on: https://code.wireshark.org/review/25694
Petri-Dish: Dario Lombardo <lomato@gmail.com>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2018-02-08 19:29:45 +00:00
Michael Mann 148fb1acf4 Add wmem allocator parameter to format_uri
Change-Id: Ic6de84a37b501e9c62a7d37071b2b081a1a1dd50
Reviewed-on: https://code.wireshark.org/review/19885
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
2017-01-31 17:08:54 +00:00
Michael Mann 51a3014225 format_text_wmem -> format_text
All cases of the "original" format_text have been handled to add the
proper wmem allocator scope.  Remove the "original" format_text
and replace it with one that has a wmem allocator as a parameter.

Change-Id: I278b93bcb4a17ff396413b75cd332f5fc2666719
Reviewed-on: https://code.wireshark.org/review/19884
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
2017-01-31 17:08:47 +00:00
Michael Mann d802b5b0ec Add format_text_wmem.
This allows for a wmem_allocator for users of format_text who want
it (dissectors for wmem_packet_scope()).  This lessens the role of
current format_text functionality in hopes that it will eventually
be replaced.

Change-Id: I970557a65e32aa79634a3fcc654ab641b871178e
Reviewed-on: https://code.wireshark.org/review/19855
Reviewed-by: Michael Mann <mmann78@netscape.net>
2017-01-31 02:26:35 +00:00
Michael Mann f789c91a5e Have format_text_wsp use wmem allocated memory.
format_text_wsp is fed into by tvb_format_text_wsp and tvb_format_stringzpad_wsp
so those functions need to add a wmem allocated parameter as well.
Most of the changes came from tvb_format_text_wsp and tvb_format_stringzpad_wsp
being changed more so than format_text_wsp.

Change-Id: I52214ca107016f0e96371a9a8430aa89336f91d7
Reviewed-on: https://code.wireshark.org/review/19851
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
2017-01-30 02:25:45 +00:00
Michael Mann c44c8f9e6c Have format_text_chr use wmem allocated memory.
Change-Id: Idcea59f6fc84238f04d9ffc11a0088ef97beec0c
Reviewed-on: https://code.wireshark.org/review/19844
Petri-Dish: Michael Mann <mmann78@netscape.net>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: Michael Mann <mmann78@netscape.net>
2017-01-30 00:05:39 +00:00
João Valverde 4c330cc0e4 Fix constness
Change-Id: I29723dae83373768edd254c60e48a717abf20694
Reviewed-on: https://code.wireshark.org/review/13436
Petri-Dish: João Valverde <j@v6e.pt>
Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org>
Reviewed-by: João Valverde <j@v6e.pt>
2016-01-20 16:12:07 +00:00
Guy Harris 44e7ce54ff Remove some apparently-unnecessary includes of emem.h.
Change-Id: Ib7d1b587b439ff21ec6b7f1756ce6ccf25b66f80
Reviewed-on: https://code.wireshark.org/review/6635
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2015-01-18 20:19:05 +00:00
Guy Harris 033f096ee9 Don't use ctype.h routines.
That avoids locale dependency and handles possibly-signed chars (which
we weren't always doing before).

Change-Id: Ieceb93029252f646397b6488f2df8a57c6d2a23d
Reviewed-on: https://code.wireshark.org/review/4794
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2014-10-17 23:11:18 +00:00
Guy Harris 71a42e0fbc Oops, I missed one "cast a char to int and use it as a subscript" case.
Casting a signed char with a negative value to int will preserve the
value, so it'll still be a negative subscript.  Cast to guchar instead,
to make sure 0x80 through 0xFF are treated as 128 to 255, not -128 to
-1.

Change-Id: I1f0b33ba3686e963d45317b45465ff335431d17f
Reviewed-on: https://code.wireshark.org/review/4742
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2014-10-16 20:04:28 +00:00
Guy Harris 50add40a2d Fix some more "char is unsigned" issues, and a possible "char is signed" one.
C neither guarantees that char is signed nor that it's unsigned.  Make
the str_to_nibble tables arrays of gint8, to make sure they can hold
numbers between 0 and 15 as well as -1.  Cast gchar to guchar, not int,
when using it as a subscript into that array, so that the subscripts are
in the range 0 to 255, not -128 to 127.

Change-Id: Ib85de5aa4e83ae9efd808c78ce3f86f45b4a3f2a
Reviewed-on: https://code.wireshark.org/review/4734
Reviewed-by: Guy Harris <guy@alum.mit.edu>
2014-10-16 18:16:58 +00:00
Gerald Combs 6397ad43c2 Revert "Qt: Try to fix a Visual C++ encoding warning."
Revert gafa8c02 since it didn't work on Windows. Use a pragma to squelch
Visual C++ instead.

Qt's rich text renderer doesn't handle "&apos;". Replace it with "&#x27;".
Remove a QDebug include.

Change-Id: I0e6308efda74a4bc0e67ce841a50a0a9b68f4a8b
Reviewed-on: https://code.wireshark.org/review/4511
Reviewed-by: Gerald Combs <gerald@wireshark.org>
2014-10-06 23:34:56 +00:00
Jeff Morriss 511e1fbf3e Fix up some formatting.
Change-Id: Ib38561ad5cf0f532e43ae3e10bbb857bb24ab9b6
Reviewed-on: https://code.wireshark.org/review/3980
Reviewed-by: Jeff Morriss <jeff.morriss.ws@gmail.com>
2014-09-04 01:40:37 +00:00
Hadriel Kaplan f52626cc83 Add tvb_get and proto_tree_add for string-encoded byte arrays
This commit adds tvb_get_string_bytes and proto_tree_add_bytes_item routines for
getting GByteArrays fields from the tvb when they are encoded in ASCII hex string form.

The proto_tree_add_bytes_item routine is also usable for normal
binary encoded byte arrays, and has the advantage of retrieving
the array values even if there's no proto tree.

It also exposes the routines to Lua, both so that a Lua script can take
advantage of this, but also so I can write a testsuite to test the functions.

Change-Id: I112a038653df6482a5d0ebe7c95708f207319e20
Reviewed-on: https://code.wireshark.org/review/1158
Reviewed-by: Hadriel Kaplan <hadrielk@yahoo.com>
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2014-04-17 14:04:19 +00:00
Alexis La Goutte 296591399f Remove all $Id$ from top of file
(Using sed : sed -i '/^ \* \$Id\$/,+1 d')

Fix manually some typo (in export_object_dicom.c and crc16-plain.c)

Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8
Reviewed-on: https://code.wireshark.org/review/497
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2014-03-04 14:27:33 +00:00
Bill Meier 11b5c15fdb Remove trailing whitespace
Change-Id: I8116f63ff88687c8db3fd6e8e23b22ab2f759af0
Reviewed-on: https://code.wireshark.org/review/385
Reviewed-by: Bill Meier <wmeier@newsguy.com>
Tested-by: Bill Meier <wmeier@newsguy.com>
2014-02-25 20:46:49 +00:00
Jakub Zawadzki d28084d183 Move UAT xton() to wsutil library
Use ws_xton() in few more places.

svn path=/trunk/; revision=54642
2014-01-08 00:28:13 +00:00
Jakub Zawadzki 33ef0c2600 isascii(x) && isprint(x) -> g_ascii_isprint(x)
svn path=/trunk/; revision=54328
2013-12-21 15:12:11 +00:00
Jakub Zawadzki 746ee39329 Drop isprint.h use g_ascii_isprint() when this include hack was enabled.
svn path=/trunk/; revision=54327
2013-12-21 15:01:45 +00:00