Regex based import: documentation and release notes

Added documentation on the Regular Expression import mode
Added documentation for the associated ui-fields
Updated the screenshot for the import-from-hexdump dialog
Added a screenshot of the Regular expression mode tab
Updated the documentation for the updated Timestamp format
Added an entry in the release notes about this new/updated feature
This commit is contained in:
Paul Weiß 2021-02-24 01:52:24 +01:00 committed by Wireshark GitLab Utility
parent 8c1b29a597
commit afd1bb0381
4 changed files with 156 additions and 24 deletions

View File

@ -50,6 +50,12 @@ They previously shipped with Npcap 1.10.
* Wireshark now supports dissecting the rtp packet with OPUS payload.
* Importing captures from text files is now also possible based on regular expressions. By specifying a regex capturing a single
packet including capturing groups for relevant fields a textfile can be converted to a libcap capture file. Supported data
encodings are plain-hexadecimal, -octal, -binary and base64.
Also the timestamp format now allows the second-fractions to be placed anywhere in the timestamp and it will be stored with
nanosecond instead of microsecond precision.
// === Removed Features and Support
//=== Removed Dissectors

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 56 KiB

View File

@ -424,11 +424,17 @@ This is the Qt file open dialog with additional Wireshark extensions.
=== Import Hex Dump
Wireshark can read in an ASCII hex dump and write the data described into a
Wireshark can read in a hex dump and write the data described into a
temporary libpcap capture file. It can read hex dumps with multiple packets in
them, and build a capture file of multiple packets. It is also capable of
generating dummy Ethernet, IP and UDP, TCP, or SCTP headers, in order to build
fully processable packet dumps from hexdumps of application-level data only.
Alternatively a Dummy PDU header can be added to specify a dissector the data
should be passed to initially.
Two methods for converting the input are supported:
==== Standard ASCII Hexdumps
Wireshark understands a hexdump of the form generated by `od -Ax -tx1 -v`. In
other words, each byte is individually displayed and surrounded with a space.
@ -461,7 +467,7 @@ single text file with a series of hexdumps can be converted into a packet
capture with multiple packets. Packets may be preceded by a timestamp. These are
interpreted according to the format given. If not the first packet is
timestamped with the current time the import takes place. Multiple packets are
written with timestamps differing by one microsecond each. In general, short of
written with timestamps differing by one nanosecond each. In general, short of
these restrictions, Wireshark is pretty liberal about reading in hexdumps and
has been tested with a variety of mangled outputs (including being forwarded
through email multiple times, with limited line wrap etc.)
@ -471,15 +477,85 @@ non-whitespace character is `#` will be ignored as a comment. Any line beginning
with `#TEXT2PCAP` is a directive and options can be inserted after this command to
be processed by Wireshark. Currently there are no directives implemented. In the
future these may be used to give more fine grained control on the dump and the
way it should be processed e.g. timestamps, encapsulation type etc. Wireshark
also allows the user to read in dumps of application-level data, by inserting
dummy L2, L3 and L4 headers before each packet. The user can elect to insert
Ethernet headers, Ethernet and IP, or Ethernet, IP and UDP/TCP/SCTP headers
before each packet. This allows Wireshark or any other full-packet decoder to
handle these dumps.
way it should be processed e.g. timestamps, encapsulation type etc.
==== Regular Text Dumps
Wireshark is also capable of scanning the input using a custom perl regular
expression as specified by GLib's https://developer.gnome.org/glib/stable/glib-regex-syntax.html[GRegex here].
Using a regex capturing a single packet in the given file
wireshark will search the given file from start to the second to last character
(the last character has to be `\n` and is ignored)
for non-overlapping (and non-empty) strings matching the given regex and then
identify the fields to import using named capturing subgroups. Using provided
format information for each field they are then decoded and translated into a
standard libpcap file retaining packet order.
Note that each named capturing subgroup has to match _exaclty_ once a packet,
but they may be present multiple times in the regex.
For example the following dump:
----
> 0:00:00.265620 a130368b000000080060
> 0:00:00.280836 a1216c8b00000000000089086b0b82020407
< 0:00:00.295459 a2010800000000000000000800000000
> 0:00:00.296982 a1303c8b00000008007088286b0bc1ffcbf0f9ff
> 0:00:00.305644 a121718b0000000000008ba86a0b8008
< 0:00:00.319061 a2010900000000000000001000600000
> 0:00:00.330937 a130428b00000008007589186b0bb9ffd9f0fdfa3eb4295e99f3aaffd2f005
> 0:00:00.356037 a121788b0000000000008a18
----
could be imported using these settings:
----
regex: ^(?<dir>[<>])\s(?<time>\d+:\d\d:\d\d.\d+)\s(?<data>[0-9a-fA-F]+)$
timestamp: %H:%M:%S.%f
dir: in: < out: >
encoding: HEX
----
Caution has to be applied when discarding the anchors `^` and `$`, as the input
is searched, not parsed, meaning even most incorrect regexes will produce valid
looking results when not anchored (however anchors are not guaranteed to prevent
this). It is generally recommended to sanity check any files created using
this conversion.
Supported fields:
* data: Actual captured frame data
+
The only mandatory field. This should match the encoded binary data captured and
is used as the actual frame data to import.
+
* time: timestamp for the packet
+
The captured field will be parsed according to the given timestamp format into a
timestamp.
+
If no timestamp is present an arbitrary counter will count up seconds and
nanoseconds by one each packet.
* dir: the direction the packet was sent over the wire
+
The captured field is expected to be one character in length, any remaining
characters are ignored (e.g. given "Input" only the 'I' is looked at). This
character is compared to lists of characters corresponding to inbound and
outbound and the packet is assigned the corresponding direction.
If neither list yields a match, the direction is set to unknown.
+
If this field is not specified the entire file has no directional information.
+
* seqno: an ID for this packet
+
Each packet can be assigned a arbitrary ID that can used as field by Wireshark.
This field is assumed to be a positive integer base 10. This field can e.g.
be used to reorder out of order captures after the import.
+
If this field is not given, no IDs will be present in the resulting file.
+
[[ChIOImportDialog]]
==== The “Import From Hex Dump” Dialog Box
This dialog box lets you select a text file, containing a hex dump of packet
@ -487,41 +563,87 @@ data, to be imported and set import parameters.
[[ChIOFileImportDialog]]
.The “Import from Hex Dump” dialog
.The “Import from Hex Dump” dialog in Hex Dump mode
image::wsug_graphics/ws-file-import.png[{medium-screenshot-attrs}]
Specific controls of this import dialog are split in two sections:
Specific controls of this import dialog are split in three sections:
Import from:: Determine which input file has to be imported and how it is to be
interpreted.
File Source:: Determine which input file has to be imported
Input Format:: Determine how the input file has to be interpreted.
Encapsulation:: Determine how the data is to be encapsulated.
The import parameters are as follows:
==== File source
Filename / Browse::
Enter the name of the text file to import. You can use _Browse_ to browse for a
file.
==== Input Format
This section is split in the two alternatives for input conversion, accessible in
the two Tabs "Hex Dump" and "Regular Expression"
In addition to the conversion mode specific inputs, there are also common
parameters, currently only the timestamp format.
===== The Hex Dump tab
Offsets::
Select the radix of the offsets given in the text file to import. This is
usually hexadecimal, but decimal and octal are also supported. Select _None_
when only the bytes are present. These will be imported as a single packet.
Timestamp Format::
This is the format specifier used to parse the timestamps in the text file to
import. It uses a simple syntax to describe the format of the timestamps, using
%H for hours, %M for minutes, %S for seconds, etc. The straightforward HH:MM:SS
format is covered by %T. For a full definition of the syntax look for
`strptime(3)`. If there are no timestamps in the text file to import leave this
field empty and timestamps will be generated based on the time of import.
Direction indication::
Tick this box if the text file to import has direction indicators before each
frame. These are on a separate line before each frame and start with either
_I_ or _i_ for input and _O_ or _o_ for output.
The encapsulation parameters are as follows:
===== The Regular Expression tab
.The "Regular Expression" tab inside the "Import from Hex Dump” dialog.
image::wsug_graphics/ws-file-import-regex.png[{medium-screenshot-attrs}]
Packet format regular expression::
This is the regex used for searching packets and metadata inside the input file.
Named capuring subgroups are used to find the individual fields. Anchors `^` and
`$` are set to match directly before and after newlines `\n` or `\r\n`. See
https://developer.gnome.org/glib/stable/glib-regex-syntax.html[GRegex] for a full
documentation.
Data encoding::
The Encoding used for the binary data. Supported encodings are plain-hexadecimal,
-octal, -binary and base64. Plain here means no additional
characters are present in the data field beyond whitespaces, which are ignored.
Any unexpected characters abort the import process.
+
Ignored whitespaces are `\r`, `\n`, `\t`, `\v`, ` ` and only for hex `:`, only
for base64 `=`.
+
Any incomplete bytes at the field's end are assumed to be padding to fill the
last complete byte. These bits should be zero, however this is not checked.
Direction indication::
The lists of characters indicating incoming vs. outgoing packets. Tese fields
are only available when the regex contains a `(?<dir>...)` group.
===== Common items
Timestamp Format::
This is the format specifier used to parse the timestamps in the text file to
import. It uses the same format as `strptime(3)` with the addition of `%f` for
zero padded fractions of seconds. The percision of `%f` is determined from it's
length. The most common fields are `%H`, `%M` and `%S` for hours, minutes and
seconds. The straightforward HH:MM:SS format is covered by %T. For a full
definition of the syntax look for `strptime(3)`,
+
In Regex mode this field is only available when a `(?<time>...)` group is present.
+
In Hex Dump mode if there are no timestamps in the text file to import, leave this
field empty and timestamps will be generated based on the time of import.
==== Encapsulation
Encapsulation type::
Here you can select which type of frames you are importing. This all depends on
@ -536,7 +658,7 @@ IP, UDP, TCP or SCTP headers or SCTP data chunks. When selecting a type of
dummy header the applicable entries are enabled, others are grayed out and
default values are used.
When the _Wireshark Upper PDU export_ encapsulation is selected the option
_ExportPDU_ becomes available. This allows you to enter the name of the
_ExportPDU_ becomes available. This allows you to select the name of the
dissector these frames are to be directed to.
Maximum frame length::
@ -548,6 +670,10 @@ Once all input and import parameters are setup click btn:[Import] to start the
import. If your current data wasnt saved before you will be asked to save it
first.
If the import button doesn't unlock, make sure all encapsualation parameters are
in the expected range and all unlocked fields are populated when using regex mode
(the placeholder text is not used as default).
When completed there will be a new capture file loaded with the frames imported
from the text file.
@ -864,7 +990,7 @@ image::wsug_graphics/ws-export-pdus-to-file.png[{screenshot-attrs}]
. In the field below the `Display Filter` field you can choose the level, from which you want to export the PDUs to the file. There are seven levels:
+
.. `DLT User`. You can export a protocol, which is framed in the user data link type table without the need to reconfigure the DLT user table. For more information, see the link:https://gitlab.com/wireshark/wireshark/-/wikis/HowToDissectAnything[How to Dissect Anything] page.
.. `DLT User`. You can export a protocol, which is framed in the user data link type table without the need to reconfigure the DLT user table. For more information, see the link:https://gitlab.com/wireshark/wireshark/-/wikis/HowToDissectAnything[How to Dissect Anything] page.
+
.. `DVB-CI`. You can use it for the Digital Video Broadcasting (DVB) protocol.
+