wireshark/doc
Peter Wu 82e8aa33a7 dfilter-test.py: document parallelization support
Tests are independent and can be run in parallel using pytest-xdist
(https://github.com/pytest-dev/pytest-xdist), document it.

While at it, allow running the tests from other directories.

Change-Id: I3e55c549669f7d59d35cd64eca53680cea6dec2d
Reviewed-on: https://code.wireshark.org/review/26943
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Petri-Dish: Peter Wu <peter@lekensteyn.nl>
Tested-by: Petri Dish Buildbot
Reviewed-by: Anders Broman <a.broman58@gmail.com>
2018-04-15 05:34:41 +00:00
..
plugins.example replace SPDX identifier GPL-2.0+ with GPL-2.0-or-later. 2018-02-08 14:57:36 +00:00
.gitignore
CMakeLists.txt Transition from GeoIP Legacy to MaxMindDB. 2018-03-06 18:02:21 +00:00
Makefile.am Transition from GeoIP Legacy to MaxMindDB. 2018-03-06 18:02:21 +00:00
README.build-system Add doc/README.build-system with CMake and autotools information 2017-12-13 23:14:30 +00:00
README.capture Docs: Remove a bunch of GTK+ references. 2018-04-09 05:02:12 +00:00
README.design doc: minor changes in README files. 2018-04-02 06:29:33 +00:00
README.developer Remove some references to Qt4. 2018-04-06 03:11:52 +00:00
README.display_filter dfilter-test.py: document parallelization support 2018-04-15 05:34:41 +00:00
README.dissector Improve the documentation of tvb_new_subset_ routines. 2018-04-10 18:01:50 +00:00
README.extcap extcap: Extend documentation for new features 2018-04-05 11:01:45 +00:00
README.heuristic Update heuristic dissectors documentation 2017-07-26 01:32:38 +00:00
README.idl2wrs Remove Makefile.common files 2016-06-30 11:04:17 +00:00
README.packaging Update some documentation to prefer CMake. 2018-03-16 06:08:53 +00:00
README.plugins Docs: Remove a bunch of GTK+ references. 2018-04-09 05:02:12 +00:00
README.regression
README.request_response_tracking HTTP: Add request/response arrows 2016-07-12 12:14:50 +00:00
README.stats_tree doc: minor changes in README files. 2018-04-02 06:29:33 +00:00
README.tapping Docs: Remove a bunch of GTK+ references. 2018-04-09 05:02:12 +00:00
README.vagrant A bunch of "{Mac} OS X" -> "macOS" changes. 2017-04-05 19:16:22 +00:00
README.wmem doc: fix typos. 2016-08-06 00:31:23 +00:00
README.wslua wslua: prepare for split class/instance (meta)methods 2016-10-18 20:41:34 +00:00
README.xml-output Add support for color xml attributes in psml and pdml formats. 2017-06-12 03:23:38 +00:00
androiddump.pod androiddump: Update help 2016-10-19 13:24:24 +00:00
asn2deb.pod
capinfos.pod Capinfos: Add SHA256. Remove MD5. 2017-09-27 08:08:44 +00:00
captype.pod Update the doc directory to use https://www.wireshark.org. 2014-09-17 00:12:26 +00:00
ciscodump.pod ciscodump (man): fix typo dependant -> dependent 2017-03-19 16:50:15 +00:00
dftest.pod
dumpcap.pod Use pcapng as the name of the file format. 2018-01-09 00:38:51 +00:00
editcap.pod editcap: handle too short frames in frame comparison 2017-02-08 22:31:43 +00:00
extcap.pod extcap: add info to extcap manpage (taken from README.extcap). 2016-12-14 14:36:16 +00:00
extcap_example.py extcap: Fix version request 2018-04-06 10:41:53 +00:00
idl2deb.pod
idl2wrs.pod Update the doc directory to use https://www.wireshark.org. 2014-09-17 00:12:26 +00:00
make-authors-format.pl replace SPDX identifier GPL-2.0+ with GPL-2.0-or-later. 2018-02-08 14:57:36 +00:00
make-authors-short.pl replace SPDX identifier GPL-2.0+ with GPL-2.0-or-later. 2018-02-08 14:57:36 +00:00
mergecap.pod Improve file merging for mergecap and wireshark 2015-08-18 14:52:00 +00:00
mmdbresolve.pod Transition from GeoIP Legacy to MaxMindDB. 2018-03-06 18:02:21 +00:00
packet-PROTOABBREV.c replace SPDX identifier GPL-2.0+ with GPL-2.0-or-later. 2018-02-08 14:57:36 +00:00
perlnoutf.pl replace SPDX identifier GPL-2.0+ with GPL-2.0-or-later. 2018-02-08 14:57:36 +00:00
randpkt.pod Add IPv6 to the randpkt manpage 2016-07-27 22:09:42 +00:00
randpkt.txt
randpktdump.pod extcap: add randpktdump, a random packet generator. 2015-12-22 12:24:16 +00:00
rawshark.pod Update documentation with reference to $XDG_CONFIG_HOME 2017-03-03 19:29:44 +00:00
reordercap.pod Update the doc directory to use https://www.wireshark.org. 2014-09-17 00:12:26 +00:00
sgml.doc.template
sshdump.pod sshdump: add remote capture command. 2016-10-04 09:58:25 +00:00
text2pcap.pod Use pcapng as the name of the file format. 2018-01-09 00:38:51 +00:00
tshark.pod tshark(.pod): fix spelling-error-in-manpage found by lintian 2018-02-28 08:28:35 +00:00
udpdump.pod extcap: add udpdump. 2016-09-16 08:07:30 +00:00
wireshark-filter.pod Make "matches" case-insensitive. 2017-06-22 19:32:06 +00:00
wireshark.pod.template Docs: Remove a bunch of GTK+ references. 2018-04-09 05:02:12 +00:00

README.xml-output

Protocol Dissection in XML Format
=================================
Copyright (c) 2003 by Gilbert Ramirez <gram@alumni.rice.edu>

Wireshark has the ability to export its protocol dissection in an
XML format, tshark has similar functionality by using the "-Tpdml"
option.

The XML that wireshark produces follows the Packet Details Markup
Language (PDML) specified by the group at the Politecnico Di Torino
working on Analyzer. The specification was found at:

http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm

That URL is not working any more, but a copy can be found at:

http://gd.tuwien.ac.at/.vhost/analyzer.polito.it/docs/dissectors/PDMLSpec.htm

or at the internet archive:

https://web.archive.org/web/20050305174853/http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm

This is similar to the NetPDL language specification:

http://www.nbee.org/doku.php?id=netpdl:index

A related XML format, the Packet Summary Markup Language (PSML), is
also defined by the Analyzer group to provide packet summary information.
The PSML format is not documented in a publicly-available HTML document,
but its format is simple. Wireshark can export this format too. Some day it
may be added to tshark so that "-Tpsml" would produce PSML.

One wonders if the "-T" option should read "-Txml" instead of "-Tpdml"
(and in the future, "-Tpsml"), but if tshark was required to produce
another XML-based format of its protocol dissection, then "-Txml" would
be ambiguous.

PDML
====
The PDML that wireshark produces is known not to be loadable into Analyzer.
It causes Analyzer to crash. As such, the PDML that wireshark produces
is labeled with a version number of "0", which means that the PDML does
not fully follow the PDML spec. Furthermore, a creator attribute in the
"<pdml>" tag gives the version number of wireshark/tshark that produced the PDML.
In that way, as the PDML produced by wireshark matures, but still does not
meet the PDML spec, scripts can make intelligent decisions about how to
best parse the PDML, based on the "creator" attribute.

A PDML file is delimited by a "<pdml>" tag.
A PDML file contains multiple packets, denoted by the "<packet>" tag.
A packet will contain multiple protocols, denoted by the "<proto>" tag.
A protocol might contain one or more fields, denoted by the "<field>" tag.

A pseudo-protocol named "geninfo" is produced, as is required by the PDML
spec, and exported as the first protocol after the opening "<packet>" tag.
Its information comes from wireshark's "frame" protocol, which serves
the similar purpose of storing packet meta-data. Both "geninfo" and
"frame" protocols are provided in the PDML output.

The "<pdml>" tag
================
Example:
	<pdml version="0" creator="wireshark/0.9.17">

The creator is "wireshark" (i.e., the "wireshark" engine. It will always say
"wireshark", not "tshark") version 0.9.17.


The "<proto>" tag
=================
"<proto>" tags can have the following attributes:

	name - the display filter name for the protocol
	showname - the label used to describe this protocol in the protocol
		tree. This is usually the descriptive name of the protocol,
		but it can be modified by dissectors to include more data
		(tcp can do this)
	pos - the starting offset within the packet data where this
		protocol starts
	size - the number of octets in the packet data that this protocol
		covers.

The "<field>" tag
=================
"<field>" tags can have the following attributes:

	name - the display filter name for the field
	showname - the label used to describe this field in the protocol
		tree. This is usually the descriptive name of the protocol,
		followed by some representation of the value.
	pos - the starting offset within the packet data where this
		field starts
	size - the number of octets in the packet data that this field
		covers.
	value - the actual packet data, in hex, that this field covers
	show - the representation of the packet data ('value') as it would
		appear in a display filter.

Some dissectors sometimes place text into the protocol tree, without using
a field with a field-name. Those appear in PDML as "<field>" tags with no
'name' attribute, but with a 'show' attribute giving that text.

Many dissectors label the undissected payload of a protocol as belonging
to a "data" protocol, and the "data" protocol usually resided inside
that last protocol dissected. In the PDML, The "data" protocol becomes
a "data" field, placed exactly where the "data" protocol is in wireshark's
protocol tree. So, if wireshark would normally show:

+-- Frame
|
+-- Ethernet
|
+-- IP
|
+-- TCP
|
+-- HTTP
    |
    +-- Data

In PDML, the "Data" protocol would become another field under HTTP:

<packet>
	<proto name="frame">
	...
	</proto>

	<proto name="eth">
	...
	</proto>

	<proto name="ip">
	...
	</proto>

	<proto name="tcp">
	...
	</proto>

	<proto name="http">
	...
		<field name="data" value="........."/>
	</proto>
</packet>

Note that packet tag may have nonstandard color attributes, "foreground" and "background"

tools/WiresharkXML.py
====================
This is a python module which provides some infrastructure for
Python developers who wish to parse PDML. It is designed to read
a PDML file and call a user's callback function every time a packet
is constructed from the protocols and fields for a single packet.

The python user should import the module, define a callback function
which accepts one argument, and call the parse_fh function:

------------------------------------------------------------
import WiresharkXML

def my_callback(packet):
	# do something

# If the PDML is stored in a file, you can:
fh = open(xml_filename)
WiresharkXML.parse_fh(fh, my_callback)

# or, if the PDML is contained within a string, you can:
WiresharkXML.parse_string(my_string, my_callback)

# Now that the script has the packet data, do something.
------------------------------------------------------------

The object that is passed to the callback function is an
WiresharkXML.Packet object, which corresponds to a single packet.
WiresharkXML Provides 3 classes, each of which corresponds to a PDML tag:

	Packet	 - "<packet>" tag
	Protocol - "<proto>" tag
	Field    - "<field>" tag

Each of these classes has accessors which will return the defined attributes:

	get_name()
	get_showname()
	get_pos()
	get_size()
	get_value()
	get_show()

Protocols and fields can contain other fields. Thus, the Protocol and
Field class have a "children" member, which is a simple list of the
Field objects, if any, that are contained. The "children" list can be
directly accessed by code using the object. The "children" list will be
empty if this Protocol or Field contains no Fields.

Furthermore, the Packet class is a sub-class of the PacketList class.
The PacketList class provides methods to look for protocols and fields.
The term "item" is used when the item being looked for can be
a protocol or a field:

	item_exists(name) - checks if an item exists in the PacketList
	get_items(name) - returns a PacketList of all matching items


General Notes
=============
Generally, parsing XML is slow. If you're writing a script to parse
the PDML output of tshark, pass a read filter with "-R" to tshark to
try to reduce as much as possible the number of packets coming out of tshark.
The less your script has to process, the faster it will be.

tools/msnchat
=============
tools/msnchat is a sample Python program that uses WiresharkXML to parse
PDML. Given one or more capture files, it runs tshark on each of them,
providing a read filter to reduce tshark's output. It finds MSN Chat
conversations in the capture file and produces nice HTML showing the
conversations. It has only been tested with capture files containing
non-simultaneous chat sessions, but was written to more-or-less handle any
number of simultaneous chat sessions.

pdml2html.xsl
=============
pdml2html.xsl is a XSLT file to convert PDML files into HTML.
See https://wiki.wireshark.org/PDML for more details.