wireshark/doc
Ronnie Sahlberg 89f022b12b name change
svn path=/trunk/; revision=18197
2006-05-21 05:12:17 +00:00
..
Makefile.am name change 2006-05-21 05:12:17 +00:00
Makefile.nmake name change 2006-05-21 05:12:17 +00:00
README.binarytrees document the char *name parameter to the create tree functions 2006-03-11 13:21:41 +00:00
README.capture rename pcap-....c/.h files to capture-pcap-....c/.h 2005-12-06 00:07:13 +00:00
README.design Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
README.developer name change 2006-05-21 05:12:17 +00:00
README.idl2eth Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
README.malloc Add ep_tvb_memdup() description to documentation 2006-02-23 16:52:13 +00:00
README.packaging Add a document for creating an Ethereal package. 2006-02-16 18:08:06 +00:00
README.plugins Update to reflect the way the DOCSIS plugin is done; that's easier (you 2006-04-20 02:39:04 +00:00
README.regression Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
README.stats_tree name change 2006-05-21 05:12:17 +00:00
README.tapping (very minor) issue found while working on a new tap 2005-09-08 21:33:11 +00:00
README.xml-output Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
capinfos.pod instead of repeating the capture file format description over and over again (this list also tends to become outdated), just give a small description and refer to the Ethereal man page 2006-01-11 01:53:46 +00:00
dfilter2pod.pl Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
dumpcap.pod add dumpcap manual page 2006-02-09 21:49:02 +00:00
editcap.pod Add -A <start time> and -B <stop time> options to editcap 2006-03-13 22:20:07 +00:00
eproto2sgml Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
ethereal-filter.pod.template Add infrastructure for display filter functions. 2006-05-02 14:26:17 +00:00
ethereal.pod From Martin Mathieson: 2006-05-11 05:09:15 +00:00
idl2eth.pod Update URL for omniORB. 2005-06-19 19:08:37 +00:00
mergecap.pod instead of repeating the capture file format description over and over again (this list also tends to become outdated), just give a small description and refer to the Ethereal man page 2006-01-11 01:53:46 +00:00
randpkt.txt Set the svn:eol-style property on all text files to "native", so that 2004-07-18 00:24:25 +00:00
sgml.doc.template Added changes so Edit->Filters...->Apply works as I think 2000-07-29 03:20:51 +00:00
tethereal.pod eXtenstion options access to the -X command line options 2006-02-07 22:08:12 +00:00
text2pcap.pod Provide options to work with defaults of text2pcap. 2006-03-20 11:45:36 +00:00

README.xml-output

Protocol Dissection in XML Format
=================================
$Id$
Copyright (c) 2003 by Gilbert Ramirez <gram@alumni.rice.edu>


Ethereal has the ability to export its protocol dissection in an
XML format, tethereal has similar functionality by using the "-Tpdml" 
option. 

The XML that ethereal produces follows the Packet Details Markup
Language (PDML) specified by the group at the Politecnico Di Torino
working on Analyzer. The specification can be found at:

http://analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm

A related XML format, the Packet Summary Markup Language (PSML), is
also defined by the Analyzer group to provide packet summary information.
The PSML format is not documented in a publicly-available HTML document,
but its format is simple. Ethereal can export this format too. Some day it 
may be added to tethereal so that "-Tpsml" would produce PSML.

One wonders if the "-T" option should read "-Txml" instead of "-Tpdml"
(and in the future, "-Tpsml"), but if tethereal was required to produce
another XML-based format of its protocol dissection, then "-Txml" would
be ambiguous.

PDML
====
The PDML that ethereal produces is known not to be loadable into Analyzer.
It causes Analyzer to crash. As such, the PDML that ethereal produces
is be labled with a version number of "0", which means that the PDML does
not fully follow the PDML spec. Furthemore, a creator attribute in the
"<pdml>" tag gives the version number of [t]ethereal that produced the PDML.
In that way, as the PDML produced by ethereal matures, but still does not
meet the PDML spec, scripts can make intelligent decisions about how to
best parse the PDML, based on the "creator" attribute.

A PDML file is delimited by a "<pdml>" tag.
A PDML file contains multiple packets, denoted by the "<packet>" tag.
A packet will contain multiple protocols, denoted by the "<proto>" tag.
A protocol might contain one or more fields, denoted by the "<field>" tag.

A pseudo-protocol named "geninfo" is produced, as is required by the PDML
spec, and exported as the first protocol after the opening "<packet>" tag.
Its information comes from ethereal's "frame" protocol, which servers
the similar purpose of storing packet meta-data. Both "geninfo" and
"frame" protocols are provided in the PDML output.

The "<pdml>" tag
================
Example:
	<pdml version="0" creator="ethereal/0.9.17">

The creator is "ethereal" (i.e., the "ethereal" engine. It will always say
"ethereal", not "tethereal") version 0.9.17.


The "<proto>" tag
=================
"<proto>" tags can have the following attributes:

	name - the display filter name for the protocol
	showname - the label used to describe this protocol in the protocol
		tree. This is usually the descriptive name of the protocol,
		but it can be modified by dissectors to include more data
		(tcp can do this)
	pos - the starting offset within the packet data where this
			protocol starts
	size - the number of octets in the packet data that this protocol
			covers.

The "<field>" tag
=================
"<field>" tags can have the following attributes:

	name - the display filter name for the field
	showname - the label used to describe this field in the protocol
		tree. This is usually the descriptive name of the protocol,
		followed by some represention of the value.
	pos - the starting offset within the packet data where this
			field starts
	size - the number of octets in the packet data that this field
			covers.
	value - the actual packet data, in hex, that this field covers
	show - the representation of the packet data ('value') as it would
		appear in a display filter.

Some dissectors sometimes place text into the protocol tree, without using
a field with a field-name. Those appear in PDML as "<field>" tags with no
'name' attribute, but with a 'show' attribute giving that text.

Many dissectors label the undissected payload of a protocol as belonging
to a "data" protocol, and the "data" protocol usually resided inside
that last protocol dissected. In the PDML, The "data" protocol becomes
a "data" field, placed exactly where the "data" protocol is in ethereal's
protocol tree. So, if ethereal would normally show:

+-- Frame
|
+-- Ethernet
|
+-- IP
|
+-- TCP
|
+-- HTTP
    |
    +-- Data

In PDML, the "Data" protocol would become another field under HTTP:

<packet>
	<proto name="frame">
	...
	</proto>

	<proto name="eth">
	...
	</proto>

	<proto name="ip">
	...
	</proto>

	<proto name="tcp">
	...
	</proto>

	<proto name="http">
	...
		<field name="data" value="........."/>
	</proto>
</packet>



tools/EtherealXML.py
====================
This is a python module which provides some infrastructor for
Python developers who wish to parse PDML. It is designed to read
a PDML file and call a user's callback function every time a packet
is constructed from the protocols and fields for a single packet.

The python user should import the module, define a callback function
which accepts one argument, and call the parse_fh function:

------------------------------------------------------------
import EtherealXML

def my_callback(packet):
	# do something

fh = open(xml_filename)
EtherealXML.parse_fh(fh, my_callback)

# Now that the script has the packet data, do someting.
------------------------------------------------------------

The object that is passed to the callback function is an
EtherealXML.Packet object, which corresponds to a single packet.
EtherealXML Provides 3 classes, each of which corresponds to a PDML tag:

	Packet	 - "<packet>" tag
	Protocol - "<proto>" tag
	Field    - "<field>" tag

Each of these classes has accessors which will return the defined attributes:

	get_name()
	get_showname()
	get_pos()
	get_size()
	get_value()
	get_show()

Protocols and fields can contain other fields. Thus, the Protocol and
Field class have a "children" member, which is a simple list of the
Field objects, if any, that are contained. The "children" list can be
directly accessed by calling users. It will be empty of this Protocol
or Field contains no Fields.

Furthemore, the Packet class is a sub-class of the PacketList class.
The PacketList class provides methods to look for protocols and fields.
The term "item" is used when the item being looked for can be
a protocol or a field:

	item_exists(name) - checks if an item exists in the PacketList
	get_items(name) - returns a PacketList of all matching items


General Notes
=============
Generally, parsing XML is slow. If you're writing a script to parse
the PDML output of tethereal, pass a read filter with "-R" to tethereal to
try to reduce as much as possible the number of packets coming out of tethereal.
The less your script has to process, the faster it will be.

'tools/msnchat' is a sample Python program that uses EtherealXML to parse PDML.
Given one or more capture files, it runs tethereal on each of them, providing
a read filter to reduce tethereal's output. It finds MSN Chat conversations
in the capture file and produces nice HTML showing the conversations. It has
only been tested with capture files containing non-simultaneous chat sessions,
but was written to more-or-less handle any number of simultanous chat
sessions.