forked from osmocom/wireshark
text2pcap: add regex
Add support in text2pcap for the regex mode added to "Import from Hex Dump" in 3.6.0 The input and output indicators cannot (yet?) be configured, and are set to the default of allowing any of "iI<" for inbound and "oO>" for outbound. This reaches feature parity between text2pcap and Import from Hex Dump, fixes #16724. (There might be some more cleanups to do, including docs.)pespin/osmux-wip
parent
6cdb86fbc7
commit
ab347ea14e
|
@ -14,6 +14,7 @@ text2pcap - Generate a capture file from an ASCII hexdump of packets
|
|||
[manarg]
|
||||
*text2pcap*
|
||||
[ *-a* ]
|
||||
[ *-b* 2|8|16|64 ]
|
||||
[ *-D* ]
|
||||
[ *-e* <l3pid> ]
|
||||
[ *-h* ]
|
||||
|
@ -24,6 +25,7 @@ text2pcap - Generate a capture file from an ASCII hexdump of packets
|
|||
[ *-m* <max-packet> ]
|
||||
[ *-o* hex|oct|dec|none ]
|
||||
[ *-q* ]
|
||||
[ *-r* <regex> ]
|
||||
[ *-s* <srcport>,<destport>,<tag> ]
|
||||
[ *-S* <srcport>,<destport>,<ppi> ]
|
||||
[ *-t* <timefmt> ]
|
||||
|
@ -97,12 +99,57 @@ future, these may be used to give more fine grained control on the
|
|||
dump and the way it should be processed e.g. timestamps, encapsulation
|
||||
type etc.
|
||||
|
||||
*Text2pcap* is also capable of scanning a text input file using a custom Perl
|
||||
compatible regular expression that matches a single packet. *text2pcap*
|
||||
searches the given file (which must end with '\n') for non-overlapping non-empty
|
||||
strings matching the regex. Named capturing subgroups, which must match
|
||||
exactly once per packet, are used to identify fields to import. The following
|
||||
fields are supported in regex mode, one mandatory and three optional:
|
||||
|
||||
"data" Actual captured frame data to import
|
||||
"time" Timestamp of packet
|
||||
"dir" Direction of packet
|
||||
"seqno" Arbitrary ID of packet
|
||||
|
||||
The 'data' field is the captured data, which must be in a selected encoding:
|
||||
hexadecimal (the default), octal, binary, or base64 and containing no
|
||||
characters in the data field outside the encoding set besides whitespace.
|
||||
The 'time' field is parsed according to the format in the *-t* parameter.
|
||||
The first character of the 'dir' field is compared against a set of characters
|
||||
corresponding to inbound and outbound that default to "iI<" for inbound and
|
||||
"oO>" for outbound to assign a direction. The 'seqno' field is assumed to
|
||||
be a positive integer base 10 used for an arbitrary ID. An optional field's
|
||||
information will only be written if the field is present in the regex and if
|
||||
the capture file format supports it. (E.g., the pcapng format supports all
|
||||
three fields, but the pcap format only supports timestamps.)
|
||||
|
||||
Here is a sample dump that the regex mode can process with the regex
|
||||
'^(?<dir>[<>])\s(?<time>\d+:\d\d:\d\d.\d+)\s(?<data>[0-9a-fA-F]+)$' along
|
||||
with timestamp format '%H:%M:%S.%f', directional indications of '<' and '>',
|
||||
and hex encoding:
|
||||
|
||||
> 0:00:00.265620 a130368b000000080060
|
||||
> 0:00:00.280836 a1216c8b00000000000089086b0b82020407
|
||||
< 0:00:00.295459 a2010800000000000000000800000000
|
||||
> 0:00:00.296982 a1303c8b00000008007088286b0bc1ffcbf0f9ff
|
||||
> 0:00:00.305644 a121718b0000000000008ba86a0b8008
|
||||
< 0:00:00.319061 a2010900000000000000001000600000
|
||||
> 0:00:00.330937 a130428b00000008007589186b0bb9ffd9f0fdfa3eb4295e99f3aaffd2f005
|
||||
> 0:00:00.356037 a121788b0000000000008a18
|
||||
|
||||
The regex is compiled with multiline support, and it is recommended to use
|
||||
the anchors '^' and '$' for best results.
|
||||
|
||||
*Text2pcap* also allows the user to read in dumps of
|
||||
application-level data, by inserting dummy L2, L3 and L4 headers
|
||||
before each packet. The user can elect to insert Ethernet headers,
|
||||
Ethernet and IP, or Ethernet, IP and UDP/TCP/SCTP headers before each
|
||||
packet. This allows Wireshark or any other full-packet decoder to
|
||||
handle these dumps.
|
||||
handle these dumps. These encapsulation options can be used in both
|
||||
hexdump mode and regex mode.
|
||||
|
||||
When <__infile__> or <__outfile__> are '-', standard input or standard
|
||||
output, respectively, are used.
|
||||
|
||||
== OPTIONS
|
||||
|
||||
|
@ -111,16 +158,28 @@ handle these dumps.
|
|||
--
|
||||
Enables ASCII text dump identification. It allows one to identify the start of
|
||||
the ASCII text dump and not include it in the packet even if it looks like HEX.
|
||||
This parameter has no effect in regex mode.
|
||||
|
||||
*NOTE:* Do not enable it if the input file does not contain the ASCII text dump.
|
||||
--
|
||||
|
||||
-b 2|8|16|64::
|
||||
+
|
||||
--
|
||||
Specify the base (radix) of the encoding of the packet data in regex mode.
|
||||
The supported options are 2 (binary), 8 (octal), 16 (hexadecimal), and 64
|
||||
(base64 encoding), with hex as the default. This parameter has no effect
|
||||
in hexdump mode.
|
||||
--
|
||||
|
||||
-D::
|
||||
+
|
||||
--
|
||||
The text before the packet may start either with an I or O indicating that
|
||||
the packet is inbound or outbound. This is used when generating dummy headers.
|
||||
The indication is only stored if the output format is pcapng.
|
||||
The indication is only stored if the output format supports it (e.g. pcapng.)
|
||||
This parameter has no effect in regex mode, where the presence of the `<dir>`
|
||||
capturing group determines whether direction indicators are expected.
|
||||
--
|
||||
|
||||
-e <l3pid>::
|
||||
|
@ -191,14 +250,15 @@ Write the file in pcapng format rather than pcap format.
|
|||
+
|
||||
--
|
||||
Specify a name for the interface included when writing a pcapng format
|
||||
file. By default no name is defined.
|
||||
file.
|
||||
--
|
||||
|
||||
-o hex|oct|dec|none::
|
||||
+
|
||||
--
|
||||
Specify the radix for the offsets (hex, octal, decimal, or none). Defaults to
|
||||
hex. This corresponds to the `-A` option for __od__.
|
||||
hex. This corresponds to the `-A` option for __od__. This parameter has no
|
||||
effect in regex mode.
|
||||
|
||||
*NOTE:* With __-o none__, only one packet will be created, ignoring any
|
||||
direction indicators or timestamps after the first byte along with any offsets.
|
||||
|
@ -223,6 +283,15 @@ Don't display the summary of the options selected at the beginning,
|
|||
or the count of packets processed at the end.
|
||||
--
|
||||
|
||||
-r <regex>::
|
||||
+
|
||||
--
|
||||
Process the file in regex mode using __regex__ as described above.
|
||||
|
||||
*NOTE:* The regex mode uses memory-mapped I/O and does not work on
|
||||
streams that do not support seeking, like terminals and pipes.
|
||||
--
|
||||
|
||||
-s <srcport>,<destport>,<tag>::
|
||||
+
|
||||
--
|
||||
|
@ -252,10 +321,11 @@ into the SCTP header.
|
|||
--
|
||||
Treats the text before the packet as a date/time code; __timefmt__ is a
|
||||
format string supported by strftime(3), supplemented with the field
|
||||
descriptor "%f" for fractional seconds up to nanoseconds.
|
||||
descriptor '%f' for fractional seconds up to nanoseconds.
|
||||
Example: The time "10:15:14.5476" has the format code "%H:%M:%S.%f"
|
||||
The special format string __ISO__ indicates that the string should be
|
||||
parsed according to the ISO-8601 specification.
|
||||
parsed according to the ISO-8601 specification. This parameter is used
|
||||
in regex mode if and only if the `<time>` capturing group is present.
|
||||
|
||||
*NOTE:* Date/time fields from the current date/time are
|
||||
used as the default for unspecified fields.
|
||||
|
|
138
text2pcap.c
138
text2pcap.c
|
@ -90,7 +90,6 @@
|
|||
#include <wsutil/ws_getopt.h>
|
||||
|
||||
#include <errno.h>
|
||||
#include <assert.h>
|
||||
|
||||
#include "text2pcap.h"
|
||||
|
||||
|
@ -150,15 +149,12 @@ static guint32 hdr_data_chunk_ppid = 0;
|
|||
/* Export PDU */
|
||||
static gboolean hdr_export_pdu = FALSE;
|
||||
|
||||
static gboolean has_direction = FALSE;
|
||||
|
||||
/*--- Local data -----------------------------------------------------------------*/
|
||||
|
||||
/* This is where we store the packet currently being built */
|
||||
static guint32 max_offset = WTAP_MAX_PACKET_SIZE_STANDARD;
|
||||
|
||||
/* Time code of packet, derived from packet_preamble */
|
||||
static char *ts_fmt = NULL;
|
||||
static int ts_fmt_iso = 0;
|
||||
|
||||
/* Input file */
|
||||
|
@ -198,14 +194,25 @@ print_usage (FILE *output)
|
|||
" used as the default for unspecified fields.\n"
|
||||
" -D the text before the packet starts with an I or an O,\n"
|
||||
" indicating that the packet is inbound or outbound.\n"
|
||||
" This is used when generating dummy headers.\n"
|
||||
" The indication is only stored if the output format is pcapng.\n"
|
||||
" This is used when generating dummy headers if the\n"
|
||||
" output format supports it (e.g. pcapng).\n"
|
||||
" -a enable ASCII text dump identification.\n"
|
||||
" The start of the ASCII text dump can be identified\n"
|
||||
" and excluded from the packet data, even if it looks\n"
|
||||
" like a HEX dump.\n"
|
||||
" NOTE: Do not enable it if the input file does not\n"
|
||||
" contain the ASCII text dump.\n"
|
||||
" -r <regex> enable regex mode. Scan the input using <regex>, a Perl\n"
|
||||
" compatible regular expression matching a single packet.\n"
|
||||
" Named capturing subgroups are used to identify fields:\n"
|
||||
" <data> (mand.), and <time>, <dir>, and <seqno> (opt.)\n"
|
||||
" The time field format is taken from the -t option\n"
|
||||
" Example: -r '^(?<dir>[<>])\\s(?<time>\\d+:\\d\\d:\\d\\d.\\d+)\\s(?<data>[0-9a-fA-F]+)$'\n"
|
||||
" could match a file with lines like\n"
|
||||
" > 0:00:00.265620 a130368b000000080060\n"
|
||||
" < 0:00:00.295459 a2010800000000000000000800000000\n"
|
||||
" -b 2|8|16|64 encoding base (radix) of the packet data in regex mode\n"
|
||||
" (def: 16: hexadecimal) No effect in hexdump mode.\n"
|
||||
"\n"
|
||||
"Output:\n"
|
||||
" -l <typenum> link-layer type number; default is 1 (Ethernet). See\n"
|
||||
|
@ -307,27 +314,53 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
int file_type_subtype;
|
||||
int err;
|
||||
char* err_info;
|
||||
GError* gerror = NULL;
|
||||
GRegex* regex = NULL;
|
||||
|
||||
info->mode = TEXT_IMPORT_HEXDUMP;
|
||||
info->hexdump.offset_type = OFFSET_HEX;
|
||||
info->regex.encoding = ENCODING_PLAIN_HEX;
|
||||
info->payload = "data";
|
||||
|
||||
/* Initialize the version information. */
|
||||
ws_init_version_info("Text2pcap (Wireshark)", NULL, NULL, NULL);
|
||||
|
||||
/* Scan CLI parameters */
|
||||
while ((c = ws_getopt_long(argc, argv, "aDhqe:i:l:m:nN:o:u:P:s:S:t:T:v4:6:", long_options, NULL)) != -1) {
|
||||
while ((c = ws_getopt_long(argc, argv, "hqab:De:i:l:m:nN:o:u:P:r:s:S:t:T:v4:6:", long_options, NULL)) != -1) {
|
||||
switch (c) {
|
||||
case 'h':
|
||||
show_help_header("Generate a capture file from an ASCII hexdump of packets.");
|
||||
print_usage(stdout);
|
||||
exit(0);
|
||||
break;
|
||||
case 'D': has_direction = TRUE; break;
|
||||
case 'q': quiet = TRUE; break;
|
||||
case 'a': info->hexdump.identify_ascii = TRUE; break;
|
||||
case 'D': info->hexdump.has_direction = TRUE; break;
|
||||
case 'l': pcap_link_type = (guint32)strtol(ws_optarg, NULL, 0); break;
|
||||
case 'm': max_offset = (guint32)strtol(ws_optarg, NULL, 0); break;
|
||||
case 'n': use_pcapng = TRUE; break;
|
||||
case 'N': interface_name = ws_optarg; break;
|
||||
case 'b':
|
||||
{
|
||||
guint8 radix;
|
||||
if (!ws_strtou8(ws_optarg, NULL, &radix)) {
|
||||
cmdarg_err("Bad argument for '-b': %s", ws_optarg);
|
||||
print_usage(stderr);
|
||||
return INVALID_OPTION;
|
||||
}
|
||||
switch (radix) {
|
||||
case 2: info->regex.encoding = ENCODING_PLAIN_BIN; break;
|
||||
case 8: info->regex.encoding = ENCODING_PLAIN_OCT; break;
|
||||
case 16: info->regex.encoding = ENCODING_PLAIN_HEX; break;
|
||||
case 64: info->regex.encoding = ENCODING_BASE64; break;
|
||||
default:
|
||||
cmdarg_err("Bad argument for '-b': %s", ws_optarg);
|
||||
print_usage(stderr);
|
||||
return INVALID_OPTION;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
case 'o':
|
||||
if (ws_optarg[0] != 'h' && ws_optarg[0] != 'o' && ws_optarg[0] != 'd' && ws_optarg[0] != 'n') {
|
||||
cmdarg_err("Bad argument for '-o': %s", ws_optarg);
|
||||
|
@ -341,6 +374,7 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
case 'n': info->hexdump.offset_type = OFFSET_NONE; break;
|
||||
}
|
||||
break;
|
||||
|
||||
case 'e':
|
||||
hdr_ethernet = TRUE;
|
||||
if (sscanf(ws_optarg, "%x", &hdr_ethernet_proto) < 1) {
|
||||
|
@ -368,6 +402,28 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
info->payload = ws_optarg;
|
||||
break;
|
||||
|
||||
case 'r':
|
||||
info->mode = TEXT_IMPORT_REGEX;
|
||||
if (regex != NULL) {
|
||||
/* XXX: Used the option twice. Should we warn? */
|
||||
g_regex_unref(regex);
|
||||
}
|
||||
regex = g_regex_new(ws_optarg, G_REGEX_DUPNAMES | G_REGEX_OPTIMIZE | G_REGEX_MULTILINE, G_REGEX_MATCH_NOTEMPTY, &gerror);
|
||||
if (gerror) {
|
||||
cmdarg_err("%s", gerror->message);
|
||||
g_error_free(gerror);
|
||||
print_usage(stderr);
|
||||
return INVALID_OPTION;
|
||||
} else {
|
||||
if (g_regex_get_string_number(regex, "data") == -1) {
|
||||
cmdarg_err("Regex missing capturing group data (use (?<data>(...)) )");
|
||||
g_regex_unref(regex);
|
||||
print_usage(stderr);
|
||||
return INVALID_OPTION;
|
||||
}
|
||||
}
|
||||
break;
|
||||
|
||||
case 's':
|
||||
hdr_sctp = TRUE;
|
||||
hdr_data_chunk = FALSE;
|
||||
|
@ -408,6 +464,7 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
|
||||
set_hdr_ip_proto(132);
|
||||
break;
|
||||
|
||||
case 'S':
|
||||
hdr_sctp = TRUE;
|
||||
hdr_data_chunk = TRUE;
|
||||
|
@ -450,7 +507,7 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
break;
|
||||
|
||||
case 't':
|
||||
ts_fmt = ws_optarg;
|
||||
info->timestamp_format = ws_optarg;
|
||||
if (!strcmp(ws_optarg, "ISO"))
|
||||
ts_fmt_iso = 1;
|
||||
break;
|
||||
|
@ -509,10 +566,6 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
set_hdr_ip_proto(6);
|
||||
break;
|
||||
|
||||
case 'a':
|
||||
info->hexdump.identify_ascii = TRUE;
|
||||
break;
|
||||
|
||||
case 'v':
|
||||
show_version();
|
||||
exit(0);
|
||||
|
@ -598,6 +651,20 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
}
|
||||
|
||||
/* Some validation */
|
||||
|
||||
if (info->mode == TEXT_IMPORT_REGEX) {
|
||||
info->regex.format = regex;
|
||||
/* need option for data encoding */
|
||||
if (g_regex_get_string_number(regex, "dir") > -1) {
|
||||
/* XXX: Add parameter(s?) to specify these? */
|
||||
info->regex.in_indication = "iI<";
|
||||
info->regex.out_indication = "oO>";
|
||||
}
|
||||
if (g_regex_get_string_number(regex, "time") > -1 && info->timestamp_format == NULL) {
|
||||
cmdarg_err("Regex with <time> capturing group requires time format (-t)");
|
||||
return INVALID_OPTION;
|
||||
}
|
||||
}
|
||||
if (pcap_link_type != 1 && hdr_ethernet) {
|
||||
cmdarg_err("Dummy headers (-e, -i, -u, -s, -S -T) cannot be specified with link type override (-l)");
|
||||
return INVALID_OPTION;
|
||||
|
@ -640,12 +707,36 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
|
||||
if (strcmp(argv[ws_optind], "-") != 0) {
|
||||
input_filename = argv[ws_optind];
|
||||
input_file = ws_fopen(input_filename, "rb");
|
||||
if (!input_file) {
|
||||
open_failure_message(input_filename, errno, FALSE);
|
||||
return OPEN_ERROR;
|
||||
if (info->mode == TEXT_IMPORT_REGEX) {
|
||||
info->regex.import_text_GMappedFile = g_mapped_file_new(input_filename, TRUE, &gerror);
|
||||
if (gerror) {
|
||||
cmdarg_err("%s", gerror->message);
|
||||
g_error_free(gerror);
|
||||
return OPEN_ERROR;
|
||||
}
|
||||
} else {
|
||||
input_file = ws_fopen(input_filename, "rb");
|
||||
if (!input_file) {
|
||||
open_failure_message(input_filename, errno, FALSE);
|
||||
return OPEN_ERROR;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
if (info->mode == TEXT_IMPORT_REGEX) {
|
||||
/* text_import_regex requires a memory mapped file, so this likely
|
||||
* won't work, unless the user has redirected a file (not a FIFO)
|
||||
* to stdin, though that's pretty silly and unnecessary.
|
||||
* XXX: We could read until EOF, write it to a temp file, and then
|
||||
* mmap that (ugh)?
|
||||
*/
|
||||
info->regex.import_text_GMappedFile = g_mapped_file_new_from_fd(0, TRUE, &gerror);
|
||||
if (gerror) {
|
||||
cmdarg_err("%s", gerror->message);
|
||||
cmdarg_err("regex import requires memory-mapped I/O and cannot be used with terminals or pipes");
|
||||
g_error_free(gerror);
|
||||
return INVALID_OPTION;
|
||||
}
|
||||
}
|
||||
input_filename = "Standard input";
|
||||
input_file = stdin;
|
||||
}
|
||||
|
@ -686,12 +777,9 @@ parse_options(int argc, char *argv[], text_import_info_t * const info, wtap_dump
|
|||
return OPEN_ERROR;
|
||||
}
|
||||
|
||||
info->mode = TEXT_IMPORT_HEXDUMP;
|
||||
info->import_text_filename = input_filename;
|
||||
info->output_filename = output_filename;
|
||||
info->hexdump.import_text_FILE = input_file;
|
||||
info->hexdump.has_direction = has_direction;
|
||||
info->timestamp_format = ts_fmt;
|
||||
|
||||
info->encapsulation = wtap_encap_type;
|
||||
info->wdh = wdh;
|
||||
|
@ -825,8 +913,8 @@ main(int argc, char *argv[])
|
|||
goto clean_exit;
|
||||
}
|
||||
|
||||
assert(input_file != NULL);
|
||||
assert(wdh != NULL);
|
||||
ws_assert(input_file != NULL || info.regex.import_text_GMappedFile != NULL);
|
||||
ws_assert(wdh != NULL);
|
||||
|
||||
ret = text_import(&info);
|
||||
|
||||
|
@ -843,6 +931,12 @@ clean_exit:
|
|||
if (input_file) {
|
||||
fclose(input_file);
|
||||
}
|
||||
if (info.regex.import_text_GMappedFile) {
|
||||
g_mapped_file_unref(info.regex.import_text_GMappedFile);
|
||||
}
|
||||
if (info.regex.format) {
|
||||
g_regex_unref(info.regex.format);
|
||||
}
|
||||
if (wdh) {
|
||||
int err;
|
||||
char *err_info;
|
||||
|
|
Loading…
Reference in New Issue