wireshark/doc
Guy Harris e5951765d8 Dissector names are not protocol names.
A given protocol's packet format may depend, for example, on which
lower-level protocol is transporting the protocol in question.  For
example, protocols that run atop both byte-stream protocols such as TCP
and TLS, and packet-oriented protocols such as UDP or DTLS, might begin
the packet with a length when running atop a byte-stream protocol, to
indicate where this packet ends and the next packet begins in the byte
stream, but not do so when running atop a packet-oriented protocol.

Dissectors can handle this in various ways:

For example, the dissector could attempt to determine the protocol over
which the packet was transported.

Unfortunately, many of those mechanisms do so by fetching data from the
packet_info structure, and many items in that structure act as global
variables, so that, for example, if there are two two PDUs for protocol
A inside a TCP segment, and the first protocol for PDU A contains a PDU
for protocol B, and protocol B's dissector, or a dissector it calls,
modifies the information in the packet_info structure so that it no
longer indicates that the parent protocol is TCP, the second PDU for
protocol A might not be correctly dissected.

Another such mechanism is to query the previous element in the layers
structure of the packet_info structure, which is a list of protocol IDs.

Unfortunately, that is not a list of earlier protocols in the protocol
stack, it's a list of earlier protocols in the dissection, which means
that, in the above example, when the second PDU for protocol A is
dissected, the list is {...,TCP,A,B,...,A}, which means that the
previous element in the list is not TCP, so, again, the second PDU for
protocol A will not be correctly dissected.

An alternative is to have multiple dissectors for the same protocol,
with the part of the protocol that's independent of the protocol
transporting the PDU being dissected by common code.  Protocol B might
have an "over a byte-stream transport" dissector and an "over a packet
transport" dissector, with the first dissector being registered for use
over TCP and TLS and the other dissector being registered for use over
packet protocols.  This mechanism, unlike the other mechanisms, is not
dependent on information in the packet_info structure that might be
affected by dissectors other than the one for the protocol that
transports protocol B.

Furthermore, in a LINKTYPE_WIRESHARK_UPPER_PDU pcap or pcapng packet for
protocol B, there might not be any information to indicate the protocol
that transports protocol B, so there would have to be separate
dissectors for protocol B, with separate names, so that a tag giving the
protocol name would differ for B-over-byte-stream and B-over-packets.

So:

We rename EXP_PDU_TAG_PROTO_NAME and EXP_PDU_TAG_HEUR_PROTO_NAME to
EXP_PDU_TAG_DISSECTOR_NAME and EXP_PDU_TAG_HEUR_DISSECTOR_NAME, to
emphasize that they are *not* protocol names, they are dissector names
(which has always been the case - if there's a protocol with that name,
but no dissector with that name, Wireshark will not be able to handle
the packet, as it will try to look up a dissector given that name and
fail).

We fix that exported PDU dissector to refer to those tags as dissector
names, not protocol names.

We update documentation to refer to them as DISSECTOR_NAME tags, not
PROTO_NAME tags.  (If there is any documentation for this outside the
Wireshark source, it should be updated as well.)

We add comments for calls to dissector_handle_get_dissector_name() where
the dissector name is shown to the user, to indicate that it might be
that the protocol name should be used.

We update the TLS and DTLS dissectors to show the encapsulated protocol
as the string returned by dissector_handle_get_long_name(); as the
default is "Application Data", it appeaers that a descriptive name,
rather than a short API name, should be used.  (We continue to use the
dissector name in debugging messages, to indicate which dissector was
called.)
2022-09-10 22:37:11 -07:00
..
plugins.example plugins.example: Fix an installation path 2022-06-17 16:35:20 +01:00
.gitignore
CMakeLists.txt CMake: Split more Wireshark/Logray variables. 2022-09-01 09:05:58 -07:00
README.capture Don't call any routines from WinPcap/Npcap packet32.dll. 2020-07-14 07:30:59 +00:00
README.design doc: minor changes in README files. 2018-04-02 06:29:33 +00:00
README.developer Qt6: Adapt various docs 2022-08-23 10:37:14 +00:00
README.display_filter doc: Update README.display_filter 2022-07-26 12:08:57 +02:00
README.dissector epan: Refactor floating point display types 2022-08-02 13:16:46 +00:00
README.heuristic README.heuristic: minor updates 2021-12-19 08:03:04 +00:00
README.idl2wrs Docs: Clean up some Python references. 2022-08-08 16:34:45 +00:00
README.plugins A few documentation spelling fixes. 2021-08-02 17:40:55 +01:00
README.regression Remove $Id$ and other Subversion leftovers from the doc files. 2014-02-14 01:33:14 +00:00
README.request_response_tracking Move epan/wmem/wmem_scopes.h to epan/ 2021-07-26 14:56:11 +00:00
README.stats_tree Replace g_snprintf() with snprintf() 2021-12-19 20:06:13 +00:00
README.tapping README.tapping: Some minor updates 2021-12-18 14:47:57 +00:00
README.vagrant Fix some more doc folder spelling errors. 2020-09-25 22:20:21 +01:00
README.wmem wmem: Add a multimap 2021-11-21 07:16:55 -05:00
README.wslua docbook: Port make-wsluarm to Python3 2022-07-23 20:51:24 +00:00
README.xml-output documentation: update PDML/PSML doc 2021-09-22 21:19:55 -04:00
androiddump.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
asn2deb.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
capinfos.adoc Docs: Document our diagnostic output options. 2021-12-27 08:04:25 +00:00
captype.adoc Docs: Document our diagnostic output options. 2021-12-27 08:04:25 +00:00
ciscodump.adoc ciscodump: Added support for IOS XE and ASA 2022-07-22 15:55:28 +00:00
dftest.adoc Docs: Document our diagnostic output options. 2021-12-27 08:04:25 +00:00
diagnostic-options.adoc doc: fix a copy/paste error and a typo 2021-12-27 13:01:42 +00:00
dpauxmon.adoc Docs: Move includes to the top of our man pages. 2021-10-19 16:26:37 -07:00
dumpcap.adoc docs: adoc migration bolding typos; Windows pipe name syntax 2022-05-12 16:43:44 +00:00
editcap.adoc editcap/mergecap: swap 'v'|'V' options to match other CLI utilities 2022-06-16 02:13:50 +00:00
etwdump.adoc ETW: Extract IP packets from Windows event trace 2022-05-05 13:35:47 +00:00
extcap.adoc Docs: Clean up some Python references. 2022-08-08 16:34:45 +00:00
extcap_example.py Docs: Clean up some Python references. 2022-08-08 16:34:45 +00:00
falcodump.adoc extcap: Add falcodump. 2022-08-29 15:35:19 -07:00
idl2deb.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
idl2wrs.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
make-authors-short.py Doc: Port make-authors-short to Python3. 2022-06-24 18:32:50 +00:00
mergecap.adoc editcap/mergecap: swap 'v'|'V' options to match other CLI utilities 2022-06-16 02:13:50 +00:00
mmdbresolve.adoc Docs: Move includes to the top of our man pages. 2021-10-19 16:26:37 -07:00
packet-PROTOABBREV.c docs: Update the sample dissector 2022-08-15 04:53:58 +00:00
randpkt.adoc Docs: Document our diagnostic output options. 2021-12-27 08:04:25 +00:00
randpkt.txt Remove $Id$ and other Subversion leftovers from the doc files. 2014-02-14 01:33:14 +00:00
randpktdump.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
rawshark.adoc Docs: Document our diagnostic output options. 2021-12-27 08:04:25 +00:00
reordercap.adoc Docs: Document our diagnostic output options. 2021-12-27 08:04:25 +00:00
sdjournal.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
sshdump.adoc sshdump: add option to select dumpcap as remote capture command 2022-08-10 17:26:49 +00:00
text2pcap.adoc Dissector names are not protocol names. 2022-09-10 22:37:11 -07:00
tshark.adoc Fix some spelling errors 2022-08-19 17:46:34 +01:00
udpdump.adoc manpage: Fix grammar errors and improve phrasing 2022-04-13 03:39:56 +00:00
wifidump.adoc extcap: new interface, wifidump, to capture Wi-Fi frames using a remote SSH host 2022-03-09 08:01:39 +00:00
wireshark-filter.adoc dfilter: Change boolean string representation 2022-06-25 13:02:34 +01:00
wireshark.adoc docs: adoc migration bolding typos; Windows pipe name syntax 2022-05-12 16:43:44 +00:00

README.xml-output

Protocol Dissection in XML Format
=================================
Copyright (c) 2003 by Gilbert Ramirez <gram@alumni.rice.edu>

Wireshark has the ability to export its protocol dissection in an
XML format, tshark has similar functionality by using the "-Tpdml"
option.

The XML that Wireshark produces follows the Packet Details Markup
Language (PDML) specified by the group at the Politecnico Di Torino
working on Analyzer. The specification was found at:

http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm

That URL is not working anymore, but a copy can be found at the Internet
Archive:

https://web.archive.org/web/20050305174853/http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm

This is similar to the NetPDL language specification:

http://www.nbee.org/doku.php?id=netpdl:index

The domain registration there has also expired, but an Internet Archive
copy is also available at:

https://web.archive.org/web/20160305211810/http://nbee.org/doku.php?id=netpdl:index

A related XML format, the Packet Summary Markup Language (PSML), is
also defined by the Analyzer group to provide packet summary information.
The PSML format is not documented in a publicly-available HTML document,
but its format is simple. Wireshark can export this format too, and
tshark can produce it with the "-Tpsml" option.

PDML
====
The PDML that Wireshark produces is known not to be loadable into Analyzer.
It causes Analyzer to crash. As such, the PDML that Wireshark produces
is labeled with a version number of "0", which means that the PDML does
not fully follow the PDML spec. Furthermore, a creator attribute in the
"<pdml>" tag gives the version number of wireshark/tshark that produced the
PDML.

In that way, as the PDML produced by Wireshark matures, but still does not
meet the PDML spec, scripts can make intelligent decisions about how to
best parse the PDML, based on the "creator" attribute.

A PDML file is delimited by a "<pdml>" tag.
A PDML file contains multiple packets, denoted by the "<packet>" tag.
A packet will contain multiple protocols, denoted by the "<proto>" tag.
A protocol might contain one or more fields, denoted by the "<field>" tag.

A pseudo-protocol named "geninfo" is produced, as is required by the PDML
spec, and exported as the first protocol after the opening "<packet>" tag.
Its information comes from wireshark's "frame" protocol, which serves
the similar purpose of storing packet meta-data. Both "geninfo" and
"frame" protocols are provided in the PDML output.

The "<pdml>" tag
================
Example:
	<pdml version="0" creator="wireshark/0.9.17">

The creator is "wireshark" (i.e., the "wireshark" engine. It will always say
"wireshark", not "tshark") version 0.9.17.


The "<proto>" tag
=================
"<proto>" tags can have the following attributes:

	name - the display filter name for the protocol
	showname - the label used to describe this protocol in the protocol
		tree. This is usually the descriptive name of the protocol,
		but it can be modified by dissectors to include more data
		(tcp can do this)
	pos - the starting offset within the packet data where this
		protocol starts
	size - the number of octets in the packet data that this protocol
		covers.

The "<field>" tag
=================
"<field>" tags can have the following attributes:

	name - the display filter name for the field
	showname - the label used to describe this field in the protocol
		tree. This is usually the descriptive name of the protocol,
		followed by some representation of the value.
	pos - the starting offset within the packet data where this
		field starts
	size - the number of octets in the packet data that this field
		covers.
	value - the actual packet data, in hex, that this field covers
	show - the representation of the packet data ('value') as it would
		appear in a display filter.


Deviations from the PDML standard
=================================
Various dissectors parse packets in a way that does not fit all the assumptions
in the PDML specification. In some cases Wireshark adjusts the output to match
the spec more closely, but exceptions exist.

Some dissectors sometimes place text into the protocol tree, without using
a field with a field-name. Those appear in PDML as "<field>" tags with no
'name' attribute, but with a 'show' attribute giving that text.

Some dissectors place field items at the top level instead of inside a
protocol. In these cases, in the PDML output the field items are placed
inside a fake "<proto>" element named "fake-field-wrapper" in order to
maximize compliance.

Many dissectors label the undissected payload of a protocol as belonging
to a "data" protocol, and the "data" protocol often resides inside
that last protocol dissected. In the PDML, the "data" protocol becomes
a "data" field, placed exactly where the "data" protocol is in Wireshark's
protocol tree. So, if Wireshark would normally show:

+-- Frame
|
+-- Ethernet
|
+-- IP
|
+-- TCP
|
+-- HTTP
    |
    +-- Data

In PDML, the "Data" protocol would become another field under HTTP:

<packet>
	<proto name="frame">
	...
	</proto>

	<proto name="eth">
	...
	</proto>

	<proto name="ip">
	...
	</proto>

	<proto name="tcp">
	...
	</proto>

	<proto name="http">
	...
		<field name="data" value="........."/>
	</proto>
</packet>

In cases where the "data" protocol appears at the top level, it is
still converted to a field, and placed inside the "fake-field-wrapper"
protocol, just as any other top level field.

Similarly, expert info items in Wireshark belong to an internal protocol
named "_ws.expert", which is likewise converted into a "<field>" element
of that name.

Some dissectors also place subdissected protocols in a subtree instead of
at the top level. Unlike with the "data" protocol, the PDML output does
_not_ change these protocols to fields, but rather outputs them as "<proto>"
elements. This results in well-formed XML that does, however, violate the
PDML spec, as "<proto>" elements should only appear as direct children of
"<packet>" elements, with only "<field>" elements nested therein.

Note that packet tag may have nonstandard color attributes, "foreground" and "background"


tools/WiresharkXML.py
====================
This is a python module which provides some infrastructure for
Python developers who wish to parse PDML. It is designed to read
a PDML file and call a user's callback function every time a packet
is constructed from the protocols and fields for a single packet.

The python user should import the module, define a callback function
which accepts one argument, and call the parse_fh function:

------------------------------------------------------------
import WiresharkXML

def my_callback(packet):
	# do something

# If the PDML is stored in a file, you can:
fh = open(xml_filename)
WiresharkXML.parse_fh(fh, my_callback)

# or, if the PDML is contained within a string, you can:
WiresharkXML.parse_string(my_string, my_callback)

# Now that the script has the packet data, do something.
------------------------------------------------------------

The object that is passed to the callback function is an
WiresharkXML.Packet object, which corresponds to a single packet.
WiresharkXML Provides 3 classes, each of which corresponds to a PDML tag:

	Packet	 - "<packet>" tag
	Protocol - "<proto>" tag
	Field    - "<field>" tag

Each of these classes has accessors which will return the defined attributes:

	get_name()
	get_showname()
	get_pos()
	get_size()
	get_value()
	get_show()

Protocols and fields can contain other fields. Thus, the Protocol and
Field class have a "children" member, which is a simple list of the
Field objects, if any, that are contained. The "children" list can be
directly accessed by code using the object. The "children" list will be
empty if this Protocol or Field contains no Fields.

Furthermore, the Packet class is a sub-class of the PacketList class.
The PacketList class provides methods to look for protocols and fields.
The term "item" is used when the item being looked for can be
a protocol or a field:

	item_exists(name) - checks if an item exists in the PacketList
	get_items(name) - returns a PacketList of all matching items


General Notes
=============
Generally, parsing XML is slow. If you're writing a script to parse
the PDML output of tshark, pass a read filter with "-R" to tshark to
try to reduce as much as possible the number of packets coming out of tshark.
The less your script has to process, the faster it will be.

tools/msnchat
=============
tools/msnchat is a sample Python program that uses WiresharkXML to parse
PDML. Given one or more capture files, it runs tshark on each of them,
providing a read filter to reduce tshark's output. It finds MSN Chat
conversations in the capture file and produces nice HTML showing the
conversations. It has only been tested with capture files containing
non-simultaneous chat sessions, but was written to more-or-less handle any
number of simultaneous chat sessions.

pdml2html.xsl
=============
pdml2html.xsl is a XSLT file to convert PDML files into HTML.
See https://gitlab.com/wireshark/wireshark/-/wikis/PDML for more details.