doc: Update text2pcap and Import from Hexdump doc

Update the text2pcap man page and the Import from Hexdump WSUG page to clarify how to use it, for grammar, and to remove a few things that are no longer relevant. (E.g., it's no longer the case that files without an EOL don't work.) Fix #15563, #15564.
2022-02-22 07:32:01 -05:00 · 2022-02-22 07:32:01 -05:00 · 1d84a092cf
parent 0e427ac837
commit 1d84a092cf
2 changed files with 88 additions and 73 deletions
--- a/doc/text2pcap.adoc
+++ b/doc/text2pcap.adoc
@ -44,7 +44,7 @@ text2pcap - Generate a capture file from an ASCII hexdump of packets
 *Text2pcap* is a program that reads in an ASCII hex dump and writes the
 data described into a capture file.  *text2pcap* can read hexdumps with
 multiple packets in them, and build a capture file of multiple packets.
-*Text2pcap* is also capable of generating dummy Ethernet, IP and UDP, TCP,
+*Text2pcap* is also capable of generating dummy Ethernet, IP, and UDP, TCP
 or SCTP headers, in order to build fully processable packet dumps from
 hexdumps of application-level data only.

@ -56,56 +56,58 @@ file format.

 *Text2pcap* understands a hexdump of the form generated by __od -Ax
 -tx1 -v__.  In other words, each byte is individually displayed, with
-spaces separating the bytes from each other.  Each line begins with an offset
-describing the position in the packet, each new packet starts with an offset
-of 0 and there is a space separating the offset from the following bytes.
-The offset is a hex number (can also be octal or decimal - see *-o*),
-of more than two hex digits.
+spaces separating the bytes from each other.  Hex digits can be upper
+or lowercase.

-Here is a sample dump that *text2pcap* can recognize:
+In normal operation, each line must begin with an offset describing the
+position in the packet, followed a colon, space, or tab separating it from
+the bytes.  There is no limit on the width or number of bytes per line, but
+lines with only hex bytes without a leading offset are ignored (in other words,
+line breaks should not be inserted in long lines that wrap.) Offsets are more
+than two digits; they are in hex by default, but can also be in octal or
+decimal - see *-o*.  Each packet must begin with offset zero, and an offset
+zero indicates the beginning of a new packet.  Offset values must be correct;
+an unexpected value causes the current packet to be aborted and the next
+packet start awaited.  There is also a single packet mode with no offsets;
+see *-o*.

-    000000 00 0e b6 00 00 02 00 0e b6 00 00 01 08 00 45 00
-    000010 00 28 00 00 00 00 ff 01 37 d1 c0 00 02 01 c0 00
-    000020 02 02 08 00 a6 2f 00 01 00 01 48 65 6c 6c 6f 20
-    000030 57 6f 72 6c 64 21
-    000036
+Packets may be preceded by a direction indicator ('I' or 'O') and/or a
+timestamp if indicated by the command line (see *-D* and *-t*).  If both are
+present, the direction indicator precedes the timestamp.  The format of the
+timestamps is specified as a mandatory parameter to *-t*.  If no timestamp is
+parsed, in the case of the first packet the current system time is used, while
+subsequent packets are written with timestamps one microsecond later than that
+of the previous packet.

-Note the last byte must either be followed by the expected next offset value
-as in the example above or a space or a line-end character(s).
+Other text in the input data is ignored. Any text before the offset is
+ignored, including email forwarding characters '>'. Any text on a line
+after the bytes is ignored, e.g. an ASCII character dump (but see *-a* to
+ensure that hex digits in the character dump are ignored).  Any line where
+the first non-whitespace character is a '#' will be ignored as a comment.
+Any lines of text between the bytestring lines are considered preamble;
+the beginning of the preamble is scanned for the direction indicator and
+timestamp as mentioned above and otherwise ignored.

-There is no limit on the width or number of bytes per line. Also the
-text dump at the end of the line is ignored. Bytes/hex numbers can be
-uppercase or lowercase. Any text before the offset is ignored,
-including email forwarding characters '>'. Any lines of text between
-the bytestring lines is ignored. The offsets are used to track the
-bytes, so offsets must be correct. Any line which has only bytes
-without a leading offset is ignored. An offset is recognized as being
-a hex number longer than two characters. Any text after the bytes is
-ignored (e.g. the character dump). Any hex numbers in this text are
-also ignored. An offset of zero is indicative of starting a new
-packet, so a single text file with a series of hexdumps can be
-converted into a packet capture with multiple packets.
-
-Packets may be preceded by a direction indicator and a timestamp if
-indicated by the command line (see *-D* and *-t*). The format of the
-timestamps is specified as a mandatory parameter to *-t*. If timestamp
-parsing is not enabled or failed, the first packet is timestamped
-with the current time the conversion takes place. Multiple packets
-are written with timestamps differing by one microsecond each.
+Any line beginning with #TEXT2PCAP is a directive and options
+can be inserted after this command to be processed by *text2pcap*.
+Currently there are no directives implemented; in the future, these may
+be used to give more fine grained control on the dump and the way it
+should be processed e.g. timestamps, encapsulation type etc.

 In general, short of these restrictions, *text2pcap* is pretty liberal
 about reading in hexdumps and has been tested with a variety of
 mangled outputs (including being forwarded through email multiple
 times, with limited line wrap etc.)

-There are a couple of other special features to note. Any line where
-the first non-whitespace character is '#' will be ignored as a
-comment. Any line beginning with #TEXT2PCAP is a directive and options
-can be inserted after this command to be processed by
-*text2pcap*. Currently there are no directives implemented; in the
-future, these may be used to give more fine grained control on the
-dump and the way it should be processed e.g. timestamps, encapsulation
-type etc.
+Here is a sample dump that *text2pcap* can recognize, with optional
+directional indicator and timestamp:
+
+    I 2019-05-14T19:04:57Z
+    000000 00 0e b6 00 00 02 00 0e b6 00 00 01 08 00 45 00
+    000010 00 28 00 00 00 00 ff 01 37 d1 c0 00 02 01 c0 00
+    000020 02 02 08 00 a6 2f 00 01 00 01 48 65 6c 6c 6f 20
+    000030 57 6f 72 6c 64 21
+    000036

 *Text2pcap* is also capable of scanning a text input file using a custom Perl
 compatible regular expression that matches a single packet. *text2pcap*
--- a/docbook/wsug_src/WSUG_chapter_io.adoc
+++ b/docbook/wsug_src/WSUG_chapter_io.adoc
@ -436,15 +436,53 @@ Two methods for converting the input are supported:

 ==== Standard ASCII Hexdumps

-Wireshark understands a hexdump of the form generated by `od -Ax -tx1 -v`. In
-other words, each byte is individually displayed and surrounded with a space.
-Each line begins with an offset describing the position in the packet, each
-new packet starts with an offset of 0 and there is a space separating the
-offset from the following bytes. The offset is a hex number (can also be octal
-or decimal), of more than two hex digits.
-Here is a sample dump that can be imported:
+Wireshark understands a hexdump of the form generated by `od -Ax -tx1 -v`.
+In other words, each byte is individually displayed, with spaces separating
+the bytes from each other.  Hex digits can be upper or lowercase.
+
+In normal operation, each line must begin with an offset describing the
+position in the packet, followed a colon, space, or tab separating it from
+the bytes.  There is no limit on the width or number of bytes per line, but
+lines with only hex bytes without a leading offset are ignored (i.e.,
+line breaks should not be inserted in long lines that wrap.) Offsets are more
+than two digits; they are in hex by default, but can also be in octal or
+decimal.  Each packet must begin with offset zero, and an offset
+zero indicates the beginning of a new packet.  Offset values must be correct;
+an unexpected value causes the current packet to be aborted and the next
+packet start awaited.  There is also a single packet mode with no offsets.
+
+Packets may be preceded by a direction indicator ('I' or 'O') and/or a
+timestamp if indicated.  If both are present, the direction indicator precedes
+the timestamp.  The format of the timestamps must be specified.  If no timestamp
+is parsed, in the case of the first packet the current system time is used,
+while subsequent packets are written with timestamps one microsecond later than
+that of the previous packet.
+
+Other text in the input data is ignored. Any text before the offset is
+ignored, including email forwarding characters '>'. Any text on a line
+after the bytes is ignored, e.g. an ASCII character dump (but see *-a* to
+ensure that hex digits in the character dump are ignored).  Any line where
+the first non-whitespace character is a '#' will be ignored as a comment.
+Any lines of text between the bytestring lines are considered preamble;
+the beginning of the preamble is scanned for the direction indicator and
+timestamp as mentioned above and otherwise ignored.
+
+Any line beginning with #TEXT2PCAP is a directive and options
+can be inserted after this command to be processed by Wireshark.
+Currently there are no directives implemented; in the future, these may
+be used to give more fine grained control on the dump and the way it
+should be processed e.g. timestamps, encapsulation type etc.
+
+In general, short of these restrictions, Wireshark is pretty liberal
+about reading in hexdumps and has been tested with a variety of
+mangled outputs (including being forwarded through email multiple
+times, with limited line wrap etc.)
+
+Here is a sample dump that can be imported, including optional
+directional indicator and timestamp:

 ----
+I 2019-05-14T19:04:57Z
 000000 00 e0 1e a7 05 6f 00 10 ........
 000008 5a a0 b9 12 08 00 46 00 ........
 000010 03 68 00 00 00 00 0a 2e ........
@ -454,31 +492,6 @@ Here is a sample dump that can be imported:
 000030 01 01 0f 19 03 80 11 01 ........
 ----

-There is no limit on the width or number of bytes per line. Also the text dump
-at the end of the line is ignored. Byte and hex numbers can be uppercase or
-lowercase. Any text before the offset is ignored, including email forwarding
-characters _>_. Any lines of text between the bytestring lines are ignored.
-The offsets are used to track the bytes, so offsets must be correct. Any line
-which has only bytes without a leading offset is ignored. An offset is
-recognized as being a hex number longer than two characters. Any text after the
-bytes is ignored (e.g. the character dump). Any hex numbers in this text are
-also ignored. An offset of zero is indicative of starting a new packet, so a
-single text file with a series of hexdumps can be converted into a packet
-capture with multiple packets. Packets may be preceded by a timestamp. These are
-interpreted according to the format given. If not the first packet is
-timestamped with the current time the import takes place. Multiple packets are
-written with timestamps differing by one nanosecond each. In general, short of
-these restrictions, Wireshark is pretty liberal about reading in hexdumps and
-has been tested with a variety of mangled outputs (including being forwarded
-through email multiple times, with limited line wrap etc.)
-
-There are a couple of other special features to note. Any line where the first
-non-whitespace character is `#` will be ignored as a comment. Any line beginning
-with `#TEXT2PCAP` is a directive and options can be inserted after this command to
-be processed by Wireshark. Currently there are no directives implemented. In the
-future these may be used to give more fine grained control on the dump and the
-way it should be processed e.g. timestamps, encapsulation type etc.
-
 ==== Regular Text Dumps

 Wireshark is also capable of scanning the input using a custom perl regular