Add ENC_BOM to the list of bitflag modifiers, and use it with
UTF-16, UCS-2, and UCS-4 (UTF-32). If set, this means that the
first 2 (or 4) octets, if present, are checked to see if they are
a Big-Endian BYTE ORDER MARK ("ZERO WIDTH NON-BREAKING SPACE"). If so,
those octets are skipped and the encoding is set to Little-Endian
or Big-Endian depending on endianness of the BOM.
If the BOM is absent, the passed in Endianness flag is used normally.
Related to #17991
EBCDIC Code Page 500 has exactly the same repertoire as CP 037,
covering all of ISO-8859-1, but has 7 bytes permuted. It is
the default code page for DRDA; use it there.
Redefined the format of the basic macro for creating iana charset enumeration
type, enum_val_t or value_string array. That will also maintain the mapping
relationship between iana charset and wireshar string encoding.
Introduced a way to create sub sets of value_string or enum_val_t array from
the big iana charsets table. The ws_supported_mibenum_vals_character_sets_ev_array
is that kind of sub sets enum_val_t array to display enum preference 'Default
charactor encoding' of xml dissector friendly in right click popup menu.
The mibenum_charset_to_encoding() function is changed to generate mapping code
(switch/case) according to the basic macro automatically.
MIBenum values are from an IANA registry, not a WAP specification; add
<epan/iana_charsets.h> to declare the MIBenum -> Wireshark encoding
mapper routine and the value_string_ext for MIBenum values, and
epan/iana_charsets.c to define them.
Change-Id: I6d9c82cd011bd5211c688322e6423de38e161f41
Reviewed-on: https://code.wireshark.org/review/15298
Reviewed-by: Guy Harris <guy@alum.mit.edu>