wireshark

Commit Graph

Author	SHA1	Message	Date
Alexis La Goutte	b78abaf1be	windows-common: fix Clang Value stored to 'len' is never read	2020-11-19 19:01:53 +00:00
John Thacker	5df3f5d05d	Encodings: Fix missing pointer increment in 3GPP TS 28.038 unpacked The pointer isn't incremented in get_ts_23_038_7bits_string_unpacked so it just decodes the first octet length times.	2020-11-14 23:39:18 -05:00
John Thacker	e20bd408de	Use iconv to support GB 18030 and EUC-KR, allow future encodings Add support internally to using iconv (always present with glib) to convert strings from various encodings to UTF-8 (using REPLACEMENT CHARACTER as recommended), and use that to support GB 18030 and EUC-KR. Replace call directly to iconv in ANSI 637 for EUC-KR to new API. Update comments and documentation around character encodings. It is possible to replace the calls to iconv with an internal decoder later. Tested on Linux and on Windows (including with illegal characters). Closes #16630.	2020-10-21 11:26:23 +00:00
John Thacker	91b792c6dc	Replace ill-formed UTF-8 byte sequences with replacement character Implement the Unicode Standard "best practices" for replacing ill-formed sequences with the Unicode REPLACEMENT CHARACTER. Add wmem_strbuf_append_len for appending strings with embedded null characters. Clarify why wmem_strbuf_grow() doesn't always ensure that there's enough room for a new string, and short-circuit some tests there. Related to #14948	2020-10-15 21:48:28 +00:00
Guy Harris	3502d53ffb	Remove leftover cruft from previous comment.	2020-09-29 04:39:51 +00:00
Guy Harris	c597927da8	Add some more string encodings. Add an encoding for "unpacked" 3GPP TS 23.038 7-bit strings, in which each code position is in a byte of its own, rather than with the code positions packed into 7 bits. Rename the packed encoding to explicitly indicate that it's packed. Add an encoding for ETSI TS 102 221 Annex A strings. Use the new encodings.	2020-09-28 22:30:35 +00:00
Guy Harris	245086eb83	HTTPS In Still More Places, update more URLs. Microsoft reshuffled their documentation - almost all of it moved from msdn.microsoft.com to docs.microsoft.com. Some blogs moved to devblogs.microsoft.com; the comments didn't move, so in one case we go to the Wayback Machine - the link isn't dead, but it formats horribly, at least on my browser, but the archived version formats OK. Use the Wayback Machine for some URLs, and update others. Update the sections for MS-ADTS. Point to the HTML versions of some RFCs and I-Ds. Change-Id: I344b20f880de63f1ae2a4e3f9ff98af78a7fe139 Reviewed-on: https://code.wireshark.org/review/34101 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-27 22:56:35 +00:00
Guy Harris	20800366dd	HTTPS (almost) everywhere. Change all wireshark.org URLs to use https. Fix some broken links while we're at it. Change-Id: I161bf8eeca43b8027605acea666032da86f5ea1c Reviewed-on: https://code.wireshark.org/review/34089 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-26 18:44:40 +00:00
Guy Harris	f26b7cbd22	Suqlech a -Wpointer-sign warning. Change-Id: I193ff3b2faf37930128bdc02b4da36e32e306b4a Reviewed-on: https://code.wireshark.org/review/34067 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-24 09:18:00 +00:00
Guy Harris	c8933e48f2	Insert REPLACEMENT CHARACTER for various UTF-16 errors. Change-Id: I2f62a409548b2c743864ca8da5733f7a73872b3c Reviewed-on: https://code.wireshark.org/review/34066 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-24 08:44:06 +00:00
Guy Harris	e26e0b4de0	Add support for the ISO 646 "Basic code table" encoding. The "Basic code table" in ISO 646 is mostly ASCII, but some code points either 1) have more than one glyph that can be assigned to them or 2) have no glyph assigned to them. National versions choose one of the two glyphs for the code points in group 1) and assign specific glyphs to the code points in group 2); the International Reference Version assigns the same glyphs to those code points as does ASCII. For the "Basic code table" encoding, we map the code points in groups 1) and 2) to a REPLACEMENT CHARACTER; additional encodings can be added for the national versions. Add ENC_ISO_646_IRV (International Reference Version) as an alias for ENC_ASCII. Expand some comments, and add some comments, while we're at it. Change-Id: I4f1b5e426ec193775e919731c5cae1224dc65115 Reviewed-on: https://code.wireshark.org/review/33941 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-07-15 07:50:30 +00:00
Guy Harris	03c5da8d89	Add Windows code page 1252. While we're at it, add the Euro to code page 1251, expand the comments for 1250 and 1251 and some DOS code pages, and add support for code page 1251 to tvb_get_stringz_enc(). Change-Id: I053d58f87cac26ad7c109e2f1cd8807ffec0622d Reviewed-on: https://code.wireshark.org/review/33342 Petri-Dish: Guy Harris <guy@alum.mit.edu> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-05-25 01:07:36 +00:00
Gerald Combs	3aad1ef236	epan: Add a boundary check to get_t61_string. Add a boundary check to make sure we don't go past the end of "ptr". Bug: 15373 Change-Id: I85394e8e6e477b47919362af146051cc8911254b Reviewed-on: https://code.wireshark.org/review/31437 Petri-Dish: Gerald Combs <gerald@wireshark.org> Tested-by: Petri Dish Buildbot Reviewed-by: Gerald Combs <gerald@wireshark.org>	2019-01-08 00:32:45 +00:00
kanidef	5fa9257704	add encoding windows 1251, cp855, cp866 Change-Id: I0e8507cf63d89942167ca579ef304bc3d679346e Reviewed-on: https://code.wireshark.org/review/31316 Petri-Dish: Peter Wu <peter@lekensteyn.nl> Tested-by: Petri Dish Buildbot Reviewed-by: Guy Harris <guy@alum.mit.edu>	2019-01-04 23:37:17 +00:00
Dario Lombardo	55c68ee69c	epan: use SPDX indentifiers. Skipping dissectors dir for now. Change-Id: I717b66bfbc7cc81b83f8c2cbc011fcad643796aa Reviewed-on: https://code.wireshark.org/review/25694 Petri-Dish: Dario Lombardo <lomato@gmail.com> Tested-by: Petri Dish Buildbot Reviewed-by: Anders Broman <a.broman58@gmail.com>	2018-02-08 19:29:45 +00:00
Guy Harris	eec528cc70	Make a pointer const that has no need not to be const. Change-Id: I32c86988823fcea96239b199bf21b98ee3ec8a5e Reviewed-on: https://code.wireshark.org/review/25359 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2018-01-18 05:48:46 +00:00
Guy Harris	435c68cd2b	Fix SURROGATE_VALUE() to match what RFC 2781 says. While we're at it, note in the comment for get_utf_16_string() the "decoding UTF-16" algorithm in RFC 2781. Change-Id: I5d7dc5c09af0474c055796e49e0c7b94fa87d2ad Reviewed-on: https://code.wireshark.org/review/22171 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2017-06-16 18:41:00 +00:00
Guy Harris	b604fff136	Rename non-EBCDIC-specific routines. Those routines can handle any single-byte character set whose characters map to characters in the Basic Multilingual Plane; it could be used for extended ASCII, but we have another routine for that, mapping only characters with code points > 0x7f, so we just say "nonascii" rather than "ebcdic". Change-Id: I3d55b5d58e3e7ab08f3dfbfdb57a0301a30e71d4 Reviewed-on: https://code.wireshark.org/review/19214 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2016-12-12 08:20:22 +00:00
Guy Harris	4d47c9a841	Fix handling of EBCDIC string fields. Have a routine that takes a 256-element translation table and uses it to map various flavors of EBCDIC to Unicode. Have separate translation tables for "common" EBCDIC (everything that's the same in all EBCDIC code pages that include the original EBCDIC characters) and EBCDIC code page 037. Add ENC_EBCDIC_CP037 for code page 037. Change-Id: Ia882b3c0abef9e30eb54cd47396e6fa0d6342044 Reviewed-on: https://code.wireshark.org/review/19212 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2016-12-12 05:49:50 +00:00
Pascal Quantin	321b756dc4	Add T.61 character set support Bug: 13032 Change-Id: I6bf2cc2c43a6262d899a304df6576d9831115966 Reviewed-on: https://code.wireshark.org/review/18350 Petri-Dish: Michael Mann <mmann78@netscape.net> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: Michael Mann <mmann78@netscape.net>	2016-10-22 03:16:11 +00:00
Pascal Quantin	a1840c20f0	Fix dissection of 7 bits ASCII/GSM strings when the bit offset is not byte aligned Bug: 10491 Change-Id: Ib55d83b7739050ba5afd84e8184af3c4608d5776 Reviewed-on: https://code.wireshark.org/review/4228 Tested-by: Pascal Quantin <pascal.quantin@gmail.com> Petri-Dish: Pascal Quantin <pascal.quantin@gmail.com> Tested-by: Petri Dish Buildbot <buildbot-no-reply@wireshark.org> Reviewed-by: Pascal Quantin <pascal.quantin@gmail.com>	2014-09-21 20:51:08 +00:00
Guy Harris	afbb1e78e9	Use 4-space indentation consistently in epan/charsets.c. Make the EBCDIC <-> ASCII translation tables const, while we're at it. Change-Id: I15a08f7329fd32f758cf36898fe4214ae8540462 Reviewed-on: https://code.wireshark.org/review/1343 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-04-25 09:36:11 +00:00
Guy Harris	29eba5308f	Add a get_ebcdic_string() routine, similar to other get_XXX_string() routines. Use it in epan/tvbuff.c. Do some other cleanups while we're at it. Change-Id: I7aed37a568373b896aacfd23f986d445b58b77b7 Reviewed-on: https://code.wireshark.org/review/1342 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-04-25 09:30:14 +00:00
Guy Harris	6a9c924460	Move the XXX-to-UTF-8 loops to routines in epan/charsets.c. This moves a bunch of character set knowledge into epan/charsets.c. Change-Id: Ieb79dcaac9753c77703af756b666ad2ca9385d9e Reviewed-on: https://code.wireshark.org/review/1339 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-04-25 08:32:06 +00:00
Jakub Zawadzki	4bd8336017	Move GSM guint8 to unicode conversion functions to charsets.c charsets.c is already place with huge number of conversion tables. Also make gsm_default_alphabet gunichar2, all values fits in 2 bytes. Change-Id: Ia5ab6c176b4fec21ec76b06513c1d00794ba10ef Reviewed-on: https://code.wireshark.org/review/1328 Reviewed-by: Anders Broman <a.broman58@gmail.com>	2014-04-25 04:17:58 +00:00
Guy Harris	ae127f23fa	Add Mac Roman and DOS CP437. Change-Id: Ib96f2cf4ea71cd0cc2c703d58b9d254bf4c1248a Reviewed-on: https://code.wireshark.org/review/1077 Reviewed-by: Guy Harris <guy@alum.mit.edu>	2014-04-12 08:54:06 +00:00
Alexis La Goutte	296591399f	Remove all $Id$ from top of file (Using sed : sed -i '/^ \* \$Id\$/,+1 d') Fix manually some typo (in export_object_dicom.c and crc16-plain.c) Change-Id: I4c1ae68d1c4afeace8cb195b53c715cf9e1227a8 Reviewed-on: https://code.wireshark.org/review/497 Reviewed-by: Anders Broman <a.broman58@gmail.com>	2014-03-04 14:27:33 +00:00
Guy Harris	f231a273f2	Add the rest of ISO-8859-n, thanks to Jakub's "generate a mapping table" program. Put the character-encoding cases in order. svn path=/trunk/; revision=54344	2013-12-21 21:55:46 +00:00
Jakub Zawadzki	099294dd16	Add charset table for ISO/IEC 8859-9 (ENC_ISO_8859_9) svn path=/trunk/; revision=54239	2013-12-18 23:32:06 +00:00
Martin Kaiser	a07c0ff146	add support for ISO 8859-5 svn path=/trunk/; revision=54132	2013-12-15 19:13:31 +00:00
Martin Kaiser	db1b70f168	as requested, move the functions/defines for DVB character tables to separate files svn path=/trunk/; revision=54113	2013-12-15 12:05:50 +00:00
Martin Kaiser	5422134e86	TABs -> spaces add editor modelines svn path=/trunk/; revision=53888	2013-12-09 20:52:39 +00:00
Martin Kaiser	cb1cb946d3	From Jakub support DVB-SI character tables (EN 300 468) in a generic way From me move things to charsets.c/.h distinguish between single and multi byte encoding for some tables (so that the highlighted bytes match the displayed value) no character table byte -> length 0, use default table svn path=/trunk/; revision=53886	2013-12-09 20:46:27 +00:00
Jakub Zawadzki	b61dd3c68d	Encoding table for ISO/IEC 8859-2: make code points in the range 0x80-0x9F map to 0x80-0x9F (Guy Harris). svn path=/trunk/; revision=53865	2013-12-08 19:55:46 +00:00
Guy Harris	562348fbb8	Add ENC_ISO_8859_1. Move the Wikipedia links for the code page layouts in front of the tables whose contents reflect the code page layouts. svn path=/trunk/; revision=53837	2013-12-08 01:05:35 +00:00
Guy Harris	3c2bd00ccf	Note what the two new character encoding tables in charsets.c are. svn path=/trunk/; revision=53833	2013-12-07 22:45:37 +00:00
Jakub Zawadzki	0e5bc8a49c	Add string encoding for ISO/IEC 8859-2 (ENC_ISO_8859_2) svn path=/trunk/; revision=53826	2013-12-07 15:02:55 +00:00
Jakub Zawadzki	113b078a4d	Add new string proto encoding for windows-1250 (ENC_WINDOWS_1250) - Move windows-1250 to unicode encoding table to charset.c - Add tvb_get_string_unichar2, tvb_get_stringz_unichar2 functions which recode tvb-string to UTF-8. svn path=/trunk/; revision=53819	2013-12-07 10:10:03 +00:00
Jeff Morriss	3729335973	We always HAVE_CONFIG_H so don't bother checking whether we have it or not. svn path=/trunk/; revision=45016	2012-09-20 01:48:30 +00:00
Jakub Zawadzki	bf81b42e1e	Update Free Software Foundation address. (COPYING will be updated in next commit) svn path=/trunk/; revision=43536	2012-06-28 22:56:06 +00:00
Guy Harris	2aff1db0d0	Add a bunch of URLs for character encoding information. svn path=/trunk/; revision=37986	2011-07-12 00:14:37 +00:00
Ronnie Sahlberg	89f022b12b	name change svn path=/trunk/; revision=18197	2006-05-21 05:12:17 +00:00
Guy Harris	ac982aa7a5	Move the stuff to handle ASCII <-> EBCDIC conversions to "epan/charsets.c"; other character set translation code should perhaps go there as well. svn path=/trunk/; revision=11958	2004-09-10 22:59:37 +00:00

43 Commits