wireshark

Commit Graph

Author	SHA1	Message	Date
John Thacker	4c3ebe73d3	epan: ensure that the result of ws_label_strcpy is terminated Unless there is no available space, ensure that the label_str passed into ws_label_strcpy is null terminated, in the cases where the string to copy is the empty string, or begins with invalid UTF-8. Fix #18560. Fix #18551.	2022-10-27 18:37:17 -04:00
João Valverde	c1cede8d7c	epan: Format column string input for display. Format the input for display, by escaping some non printable characters, using ws_label_strcpy(). In some cases with vsnprintf() this requires using a temporary buffer. Add some debug checks for invalid UTF-8 errors. The intention here is to pass dissection data directly to the column API, and the column functions are responsible for formatting that data for display. This avoids having to call format_text() before adding a string to a column and separates the concerns better. Display formatting is an UI concern.	2022-10-26 13:28:19 +01:00
João Valverde	92e1357bb4	Rename ws_label_strcat() to ws_label_strcpy() The semantics of ws_label_strcat() are closer to g_strlcpy() so rename the function to reflect that.	2022-10-26 13:12:35 +01:00
João Valverde	f55cb116a0	Remove memset() from ws_label_str() In the interests of efficiency with multiple small writes avoid doing a memset on the whole remaining length.	2022-10-26 13:12:31 +01:00
John Thacker	d7c993d4af	epan: Fix the end offsets for hex string items hex_str_to_bytes_encoding() consumes pairs of hex digits (and optional separator) to turn into bytes. It can return a pointer to the character after the last digit consumed. Don't advance the end pointer after a single unpaired digit that is not consumed as part of the hex string returned. tvb_get_string_bytes() can pass back the end offset. If conversion fails, return the initial offset instead of zero to make repeated calls easier in cases where the full length is not decoded due to errors. Relatedly, no dissector currently uses this return value, because it's not useful currently.	2022-10-21 01:11:53 +00:00
João Valverde	603354203b	epan/proto: Replace format text() The proto.h APIs expect valid UTF-8 so replace uses of format_text() with a label copy function that just does formatting and does not check for encoding errors. Avoid multiple levels of temporary string allocations. Make sure the copy does not truncate a multibyte character and produce invalid strings. Add debug checks for UTF-8 encoding errors instead. We escape C0 and C1 control codes (because control codes) and ASCII whitespace (and bell). Overall the goal is to be more efficient and optimized and help detect misuse of APIs by passing invalid UTF-8. Add a unit test for ws_label_strcat.	2022-10-20 20:05:15 +01:00
João Valverde	15634c0b46	Move format_text() to libwsutil and add unit tests	2022-09-28 21:44:27 +00:00
João Valverde	9345bcdae5	epan: Change signature of format_text() Replace "const guchar " with "const char ".	2022-09-28 19:28:28 +01:00
João Valverde	32befe119d	Add a log domain for encoding errors and lower the log level Using a warning is probably too exalted for the current state of the code, where UTF-8 errors are somewhat expected from dissectors that are lax about input validation. Use a debug level with its own "UTF-8" domain instead. Using a dedicated domain allows to filter on encoding errors and with some enhancements to the logging subsystem make them fatal for tracking and debugging purposes. Using a dedicated domain might have other drawbacks but for now it seems like the best approach.	2022-09-28 14:57:51 +01:00
João Valverde	621257f472	epan: Add a warning for invalid UTF-8 with format_text()	2022-09-27 17:04:44 +00:00
John Thacker	73d8bb1bc3	XML: Do escape ASCII control characters XML 1.0 allows valid UTF-8 characters, except for the ASCII control characters other than tab, carriage return, and line feed. (It does not allow form feed and vertical tab, so the allowed group is not the same as the standard ctype.h isspace category. It also allows but discourages DEL (\x7F).) The characters cannot be included as character references of the form &#xx; either; there is technically no way to include them. Escape them as done prior to `89e96c1e77` but continue to leave bytes with the high bit set alone so that UTF-8 printable characters are not escaped. Fix #10445	2022-09-21 23:46:35 +00:00
John Thacker	95b45b2555	Qt: Add percent-encoding to Show Packet Bytes Add Percent-encoding to the list of encoding types that Show Packet Bytes can handle. There's a function added to glib 2.66 to handle this for arbitrary bytes that might have internal nulls (and which allows the result to be non UTF-8), but we don't require that version yet, so extend the existing function. Related to #1084	2022-09-03 17:25:28 +00:00
Moshe Kaplan	69d54d6f8e	Corrects repeated words throughout the code. Repeated words were found with: egrep "(\b[a-zA-Z]+) +\1\b" . -Ir and then manually reviewed. Non-displayed strings (e.g., in comments) were also corrected, to ease future review.	2021-12-22 11:01:11 +00:00
João Valverde	76186f16fb	epan: Rewrite format_text_chr() using standard APIs	2021-12-03 10:18:37 +00:00
João Valverde	a9c36dfb75	epan: Remove unused format_uri() function Used with the GTK GUI, not used for a long time.	2021-11-30 22:07:09 +00:00
João Valverde	1e0cc18ae8	epan: Remove duplication in format_text_wsp() This function and format_text() are very similar so use a common implementation for both.	2021-11-30 21:34:57 +00:00
João Valverde	c18e44f563	epan: Fix UTF-8 bitmask for 2-byte codepoint Fixes format_text_wsp(), use the correct bitmask from format_text().	2021-11-30 21:34:57 +00:00
João Valverde	13783fae6b	Add comment with rationale for having format_text_chr(). From `fa1027a004`.	2021-11-30 19:04:37 +00:00
João Valverde	37f2a86207	Move string_or_null() to wsutil	2021-11-29 18:37:03 +00:00
João Valverde	dcbd79584d	epan/str_util: Remove unused functions Remove ws_strdup_escape_char(). I don't think it is generic enough to keep, and it does not seem very efficient either. Remove string_replace(). This function was used in the GTK GUI.	2021-11-29 18:37:03 +00:00
João Valverde	44121e2c3b	Move escape_string() to wsutil Move this utility function to wsutil. Rename to ws_escape_string(). Also add tests.	2021-11-29 17:47:53 +00:00
João Valverde	ef8125e3ae	Move two functions from epan to wsutil/str_util Move epan_memmem() and epan_strcasestr() to wsutil/str_util. Rename to ws_memmem() and ws_strcasestr(). Add compile time check for a system implementation and use that if available. We invoke those functions using a wrapper to avoid exposing _GNU_SOURCE outside of the implementation.	2021-11-28 12:32:51 +00:00
João Valverde	1fc621e38d	epan: Fix crash with upper-case protocol filter names Registering a preference module for a protocol filter name with upper case letters aborts the program. Relax this restriction to conform with the rules for protocols. The recommendation is still to use all lower-case letters. Fixes `070aeddf76`.	2021-11-04 16:29:34 +00:00
João Valverde	925e01b23f	Remove duplicate format_size() function We have two format_size()s, with and without wmem scoped memory. Move the wmem version to wsutil and add a convenience macro to use g_malloc()ed memory.	2021-07-26 14:56:11 +00:00
John Thacker	770746cca8	epan: Fix format_text treament of Greek, Arabic, etc. format_text uses the wrong bitmask when checking for two byte UTF-8 characters, resulting in rejecting half the possible two bytes characters, including all of Arabic and Greek, and substituting REPLACEMENT CHARACTER for them. Fixes #17070, and add some comments about the current behavior that doesn't match existing comments.	2020-12-09 12:51:19 +00:00
Guy Harris	35418a73f7	Add format_text_string(), which gets the length with strlen(). format_text(alloc, string, strlen(string)) is a common idiom; provide format_text_string(), which does the strlen(string) for you. (Any string used in a %s to set the text of a protocol tree item, if it was directly extracted from the packet, should be run through a format_text routine, to ensure that it's valid UTF-8 and that control characters are handled correctly.) Update comments while we're at it. Change-Id: Ia8549efa1c96510ffce97178ed4ff7be4b02eb6e Reviewed-on: https://code.wireshark.org/review/38202 Petri-Dish: Guy Harris <gharris@sonic.net> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <gharris@sonic.net>	2020-08-20 07:24:32 +00:00
Michael Mann	f509a83381	Add format_size_wmem It's a "wmem version" of format_size (from wsutil/str_util.h). Also improved the flexibility in formatting of format_size() to handle future needs of format_size_wmem Ping-Bug: 15360 Change-Id: Id9977bbd7ec29375bbac955f685d46e75b0cef2c Reviewed-on: https://code.wireshark.org/review/31233 Petri-Dish: Michael Mann <mmann78@netscape.net> Tested-by: Petri Dish Buildbot Reviewed-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: Anders Broman <a.broman58@gmail.com>	2019-12-02 05:01:16 +00:00
Guy Harris	20800366dd	HTTPS (almost) everywhere. Change all wireshark.org URLs to use https. Fix some broken links while we're at it. Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c Reviewed-on: https://code.wireshark.org/review/34089 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-26 18:44:40 +00:00
Guy Harris	edd5eaa57e	Don't format printable non-ASCII Unicode characters as escape sequences. Note that even strings fetched with ENC_ASCII may contain them - bytes with the 8th bit set get mapped to REPLACEMENT CHARACTER. This means we can format STR_UNICODE fields with format_text(); do so. Bug: 1372 Change-Id: Ia32c3a92d220ac5174ecd25f33e2d1f85cfb8cb8 Reviewed-on: https://code.wireshark.org/review/34080 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-25 14:50:40 +00:00
Guy Harris	a409987eea	Fix format_uri(). It was using the same index into the input and output strings, which means that if it escaped any character, it would skip the next two characters in the input sring. It was also not clearing is_reserved before testing whether a character was reserved, so once it saw a character that neede dto be escaped, it would escape all subsequent characters. It was only used in get_key_string(), which was never used, so it was dead code, but let's at least fix it, even if we end up removing that code, so that if we bring it back, we bring back a non-broken version, and so that if anybody else uses it, it's not broken. Change-Id: I36588efad36908e012023bcfbd813c749a6a254f Reviewed-on: https://code.wireshark.org/review/33287 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-05-21 08:30:12 +00:00
Dario Lombardo	9aa63d2406	epan: remove return from functions returning void. Found by clang-tidy. Change-Id: Ibedfec5e5d3eca7c2e65319b7ecb4dcbe974b88b Reviewed-on: https://code.wireshark.org/review/31337 Petri-Dish: Dario Lombardo <lomato@gmail.com> Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Anders Broman <a.broman58@gmail.com>	2019-01-04 05:07:58 +00:00
Dario Lombardo	55c68ee69c	epan: use SPDX indentifiers. Skipping dissectors dir for now. Change-Id: I717b66bfbc7cc81b83f8c2cbc011fcad643796aa Reviewed-on: https://code.wireshark.org/review/25694 Petri-Dish: Dario Lombardo <lomato@gmail.com> Tested-by: Petri Dish Buildbot Reviewed-by: Anders Broman <a.broman58@gmail.com>	2018-02-08 19:29:45 +00:00
Michael Mann	148fb1acf4	Add wmem allocator parameter to format_uri Change-Id: Ic6de84a37b501e9c62a7d37071b2b081a1a1dd50 Reviewed-on: https://code.wireshark.org/review/19885 Petri-Dish: Michael Mann <mmann78@netscape.net> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: Michael Mann <mmann78@netscape.net>	2017-01-31 17:08:54 +00:00
Michael Mann	51a3014225	format_text_wmem -> format_text All cases of the "original" format_text have been handled to add the proper wmem allocator scope. Remove the "original" format_text and replace it with one that has a wmem allocator as a parameter. Change-Id: I278b93bcb4a17ff396413b75cd332f5fc2666719 Reviewed-on: https://code.wireshark.org/review/19884 Petri-Dish: Michael Mann <mmann78@netscape.net> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: Michael Mann <mmann78@netscape.net>	2017-01-31 17:08:47 +00:00
Michael Mann	d802b5b0ec	Add format_text_wmem. This allows for a wmem_allocator for users of format_text who want it (dissectors for wmem_packet_scope()). This lessens the role of current format_text functionality in hopes that it will eventually be replaced. Change-Id: I970557a65e32aa79634a3fcc654ab641b871178e Reviewed-on: https://code.wireshark.org/review/19855 Reviewed-by: Michael Mann <mmann78@netscape.net>	2017-01-31 02:26:35 +00:00
Michael Mann	f789c91a5e	Have format_text_wsp use wmem allocated memory. format_text_wsp is fed into by tvb_format_text_wsp and tvb_format_stringzpad_wsp so those functions need to add a wmem allocated parameter as well. Most of the changes came from tvb_format_text_wsp and tvb_format_stringzpad_wsp being changed more so than format_text_wsp. Change-Id: I52214ca107016f0e96371a9a8430aa89336f91d7 Reviewed-on: https://code.wireshark.org/review/19851 Petri-Dish: Michael Mann <mmann78@netscape.net> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: Michael Mann <mmann78@netscape.net>	2017-01-30 02:25:45 +00:00
Michael Mann	c44c8f9e6c	Have format_text_chr use wmem allocated memory. Change-Id: Idcea59f6fc84238f04d9ffc11a0088ef97beec0c Reviewed-on: https://code.wireshark.org/review/19844 Petri-Dish: Michael Mann <mmann78@netscape.net> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: Michael Mann <mmann78@netscape.net>	2017-01-30 00:05:39 +00:00
João Valverde	4c330cc0e4	Fix constness Change-Id: I29723dae83373768edd254c60e48a717abf20694 Reviewed-on: https://code.wireshark.org/review/13436 Petri-Dish: João Valverde <j@v6e.pt> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: João Valverde <j@v6e.pt>	2016-01-20 16:12:07 +00:00
Guy Harris	44e7ce54ff	Remove some apparently-unnecessary includes of emem.h. Change-Id: Ib7d1b587b439ff21ec6b7f1756ce6ccf25b66f80 Reviewed-on: https://code.wireshark.org/review/6635 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2015-01-18 20:19:05 +00:00
Guy Harris	033f096ee9	Don't use ctype.h routines. That avoids locale dependency and handles possibly-signed chars (which we weren't always doing before). Change-Id: Ieceb93029252f646397b6488f2df8a57c6d2a23d Reviewed-on: https://code.wireshark.org/review/4794 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-10-17 23:11:18 +00:00
Guy Harris	71a42e0fbc	Oops, I missed one "cast a char to int and use it as a subscript" case. Casting a signed char with a negative value to int will preserve the value, so it'll still be a negative subscript. Cast to guchar instead, to make sure 0x80 through 0xFF are treated as 128 to 255, not -128 to -1. Change-Id: I1f0b33ba3686e963d45317b45465ff335431d17f Reviewed-on: https://code.wireshark.org/review/4742 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-10-16 20:04:28 +00:00
Guy Harris	50add40a2d	Fix some more "char is unsigned" issues, and a possible "char is signed" one. C neither guarantees that char is signed nor that it's unsigned. Make the str_to_nibble tables arrays of gint8, to make sure they can hold numbers between 0 and 15 as well as -1. Cast gchar to guchar, not int, when using it as a subscript into that array, so that the subscripts are in the range 0 to 255, not -128 to 127. Change-Id: Ib85de5aa4e83ae9efd808c78ce3f86f45b4a3f2a Reviewed-on: https://code.wireshark.org/review/4734 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-10-16 18:16:58 +00:00
Gerald Combs	6397ad43c2	Revert "Qt: Try to fix a Visual C++ encoding warning." Revert gafa8c02 since it didn't work on Windows. Use a pragma to squelch Visual C++ instead. Qt's rich text renderer doesn't handle "'". Replace it with "'". Remove a QDebug include. Change-Id: I0e6308efda74a4bc0e67ce841a50a0a9b68f4a8b Reviewed-on: https://code.wireshark.org/review/4511 Reviewed-by: Gerald Combs <gerald@wireshark.org>	2014-10-06 23:34:56 +00:00
Jeff Morriss	511e1fbf3e	Fix up some formatting. Change-Id: Ib38561ad5cf0f532e43ae3e10bbb857bb24ab9b6 Reviewed-on: https://code.wireshark.org/review/3980 Reviewed-by: Jeff Morriss <jeff.morriss.ws@gmail.com>	2014-09-04 01:40:37 +00:00
Hadriel Kaplan	f52626cc83	Add tvb_get and proto_tree_add for string-encoded byte arrays This commit adds tvb_get_string_bytes and proto_tree_add_bytes_item routines for getting GByteArrays fields from the tvb when they are encoded in ASCII hex string form. The proto_tree_add_bytes_item routine is also usable for normal binary encoded byte arrays, and has the advantage of retrieving the array values even if there's no proto tree. It also exposes the routines to Lua, both so that a Lua script can take advantage of this, but also so I can write a testsuite to test the functions. Change-Id: I112a038653df6482a5d0ebe7c95708f207319e20 Reviewed-on: https://code.wireshark.org/review/1158 Reviewed-by: Hadriel Kaplan <hadrielk@yahoo.com> Reviewed-by: Anders Broman <a.broman58@gmail.com>	2014-04-17 14:04:19 +00:00
Alexis La Goutte	296591399f	Remove all $Id$ from top of file (Using sed : sed -i '/^ \* \$Id\$/,+1 d') Fix manually some typo (in export_object_dicom.c and crc16-plain.c) Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8 Reviewed-on: https://code.wireshark.org/review/497 Reviewed-by: Anders Broman <a.broman58@gmail.com>	2014-03-04 14:27:33 +00:00
Bill Meier	11b5c15fdb	Remove trailing whitespace Change-Id: I8116f63ff88687c8db3fd6e8e23b22ab2f759af0 Reviewed-on: https://code.wireshark.org/review/385 Reviewed-by: Bill Meier <wmeier@newsguy.com> Tested-by: Bill Meier <wmeier@newsguy.com>	2014-02-25 20:46:49 +00:00
Jakub Zawadzki	d28084d183	Move UAT xton() to wsutil library Use ws_xton() in few more places. svn path=/trunk/; revision=54642	2014-01-08 00:28:13 +00:00
Jakub Zawadzki	33ef0c2600	isascii(x) && isprint(x) -> g_ascii_isprint(x) svn path=/trunk/; revision=54328	2013-12-21 15:12:11 +00:00
Jakub Zawadzki	746ee39329	Drop isprint.h use g_ascii_isprint() when this include hack was enabled. svn path=/trunk/; revision=54327	2013-12-21 15:01:45 +00:00

1 2 3

138 Commits