compile a document about heuristic dissectors, following:

http://www.wireshark.org/lists/wireshark-dev/200808/msg00234.html svn path=/trunk/; revision=26146
2008-09-06 11:28:58 +00:00 · 2008-09-06 11:28:58 +00:00 · e8dedf19b4
parent 7530143278
commit e8dedf19b4
2 changed files with 201 additions and 0 deletions
--- a/doc/README.developer
+++ b/doc/README.developer
@ -44,6 +44,7 @@ You'll find additional dissector related information in the following README
 files:
 - README.binarytrees - fast access to large data collections
 - README.heuristic - what are heuristic dissectors and how to write them
 - README.malloc - how to obtain "memory leak free" memory
 - README.plugins - how to "pluginize" a dissector
 - README.request_response_tracking - how to track req./resp. times and such
--- a/doc/README.heuristic
+++ b/doc/README.heuristic
@ -0,0 +1,200 @@
 $Revision: 25920 $
 $Date: 2008-08-04 22:41:43 +0200 (Mo, 04 Aug 2008) $
 $Author: ulfl $
 This file is a HOWTO for Wireshark developers. It describes how Wireshark 
 heuristic protocol dissectors works and how to write them.
 This file is compiled to give in depth information on Wireshark.
 It is by no means all inclusive and complete. Please feel free to send
 remarks and patches to the developer mailing list.
 Prerequisites
 -------------
 As this file is an addition to README.developer, it is essential to read 
 and understand that document first.
 Why heuristic dissectors?
 -------------------------
 When Wireshark "receives" a packet, it has to find the right dissector to 
 start decoding the packet data. Often this can be done by known conventions, 
 e.g. the Ethernet type 0x800 means "IP on top of Ethernet" - an easy and 
 reliable match for Wireshark.
 Unfortunately, these conventions are not always available, or (accidentially 
 or knowingly) some protocols don't care about those conventions and "reuse" 
 existing "magic numbers / tokens".
 For example TCP defines port 80 only for the use of HTTP traffic. But, this 
 convention doesn't prevent anyone from using TCP port 80 for some different 
 protocol, or on the other hand using HTTP on a port number different than 80.
 To solve this problem, Wireshark introduced the so called heuristic dissector 
 mechanism to try to deal with these problems.
 How Wireshark uses heuristic dissectors?
 ----------------------------------------
 While Wireshark starts, heuristic dissectors (HD) register themselves slightly 
 different than "normal" dissectors, e.g. a HD can ask for any TCP packet, as 
 it *may* contain interesting packet data for this dissector. In reality more 
 than one HD will exist for e.g. TCP packet data.
 So if Wireshark has to decode TCP packet data, it will first try to find a 
 dissector registered directly for the TCP port used in that packet. If it 
 finds such a registered dissector it will just hand over the packet data to it.
 In case there is no such "normal" dissector, WS will hand over the packet data 
 to the first matching HD. Now the HD will look into the data and decide if that 
 data looks like the dissector "is interested in". The return value signals WS 
 if the HD processed the data (so WS can stop working on that packet) or the 
 heuristic didn't matched (so WS tries the next HD until one matches - or the 
 data simply can't be processed).
 XXX - mention "use heuristic sub dissectors first"
 How do these heuristics work?
 -----------------------------
 Difficult to give a general answer here. The usual heuristic works as follows:
 A HD looks into the first few packet bytes and searches for common patterns that 
 are specific to the protocol in question. Most protocols starts with a 
 specific header, so a specific pattern may look like (synthetic example):
 1) first byte must be 0x42
 2) second byte is a type field and only can contain values between 0x20 - 0x33
 3) third byte is a flag field, where the lower 4 bits always contain the value 0
 4) fourth and fifth bytes contains a 16 length field, where the value can't be 
   longer than 10000 bytes
 So the heuristic dissector will check incoming packet data for all of the 
 4 above conditions, and only if all of the four conditions are true there is a 
 good chance that the packet really contains the expected protocol - and the 
 dissector continues to decode the packet data. If one condition fails, it's 
 very certainly not the protocol in question and the dissector returns to WS 
 immediately "this is not my protocol" - maybe some other heuristic dissector 
 is interested!
 Obviously, this is *not* 100% bullet proof, but the best WS can offer to its 
 users here - and improving the heuristic is always possible if it turns out 
 that it's not good enough to distinguish between two given protocols.
 Heuristic Code Example
 ----------------------
 You can find a lot of code examples in the wireshark sources, e.g.:
 grep -l heur_dissector_add epan/dissectors/*.c
 returns (currently) 69 files.
 For the above example criteria, the following code example might do the work 
 (combine this with the dissector skeleton in README.developer):
 XXX - please note: The following code examples were not tried in reality, 
 please report problems to the dev-list!
 static gboolean dissect_PROTOABBREV(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree)
 {
 ...
 /* 1) first byte must be 0x42 */
 if ( tvb_get_guint8(tvb, 0) != 0x42 )
    return (FALSE);
 /* 2) second byte is a type field and only can contain values between 0x20-0x33 */
 if ( tvb_get_guint8(tvb, 1) < 0x20 || tvb_get_guint8(tvb, 1) > 0x33 )
    return (FALSE);
 /* 3) third byte is a flag field, where the lower 4 bits always contain the value 0 */
 if ( tvb_get_guint8(tvb, 2) & 0x0f )
    return (FALSE);
 /* 4) fourth and fifth bytes contains a 16 length field, where the value can't be longer than 10000 bytes */
 /* Assumes network byte order */
 if ( tvb_get_ntohs(tvb, 3) > 10000 )
    return (FALSE);
 /* Assume it’s your packet and do dissection */
 ...
 return (TRUE);
 }
 void
 proto_reg_handoff_PROTOABBREV(void)
 {
    static int PROTOABBREV_inited = FALSE;
    if ( !PROTOABBREV_inited )
    {
 	    /* register as heuristic dissector for both TCP and UDP */
        heur_dissector_add("tcp", dissect_PROTOABBREV, proto_PROTOABBREV);
        heur_dissector_add("udp", dissect_PROTOABBREV, proto_PROTOABBREV);
    }
 }
 Please note, that registering a heuristic dissector is only possible for a 
 small variety of protocols. In most cases an heuristic is not needed, and 
 adding the support would only add unused code to the dissector.
 TCP and UDP are prominent examples that support HDs, as there 
 seems to be a tendency to reuse known port numbers for new protocols.
 XXX - what to grep for, if a protocol provides HD support or not?
 It's possible to write a dissector to be a dual heuristic/normal dissector.
 In that the case, dissect_PROTOABBREV should return an int with the number of 
 bytes dissected by your protocol rather than simply returning TRUE. If 
 heuristics fail, still just return 0.
 static int dissect_PROTOABBREV(tvbuff_t *tvb, packet_info *pinfo, proto_tree *tree)
 {
 ...
 /* 1) first byte must be 0x42 */
 if ( tvb_get_guint8(tvb, 0) != 0x42 )
    return 0;
 /* 2) second byte is a type field and only can contain values between 0x20-0x33 */
 if ( tvb_get_guint8(tvb, 1) < 0x20 || tvb_get_guint8(tvb, 1) > 0x33 )
    return 0;
 /* 3) third byte is a flag field, where the lower 4 bits always contain the value 0 */
 if ( tvb_get_guint8(tvb, 2) & 0x0f )
    return 0;
 /* 4) fourth and fifth bytes contains a 16 length field, where the value can't be longer than 10000 bytes */
 /* Assumes network byte order */
 if ( tvb_get_ntohs(tvb, 3) > 10000 )
    return 0;
 /* Assume it’s your packet and do dissection */
 ...
 return number_of_bytes_dissected;
 }
 void
 proto_reg_handoff_PROTOABBREV(void)
 {
    static int PROTOABBREV_inited = FALSE;
    dissector_handle_t PROTOABBREV_handle;
    if ( !PROTOABBREV_inited )
    {
 	    /* register as heuristic dissector for both TCP and UDP */
        heur_dissector_add("tcp", dissect_PROTOABBREV, proto_PROTOABBREV);
        heur_dissector_add("udp", dissect_PROTOABBREV, proto_PROTOABBREV);
 		/* register as normal dissector for IP as well */
        PROTOABBREV_handle = new_create_dissector_handle(dissect_PROTOABBREV, proto_PROTOABBREV);
        dissector_add("ip.proto", IP_PROTO_PROTOABBREV, PROTOABBREV_handle);
        PROTOABBREV_inited = TRUE;
    }
 }