wireshark/epan/dfilter/dfilter-int.h

138 lines
3.2 KiB
C
Raw Normal View History

/** @file
*
* Wireshark - Network traffic analyzer
* By Gerald Combs <gerald@wireshark.org>
* Copyright 2001 Gerald Combs
*
* SPDX-License-Identifier: GPL-2.0-or-later
*/
#ifndef DFILTER_INT_H
#define DFILTER_INT_H
#include "dfilter.h"
#include "syntax-tree.h"
#include <epan/proto.h>
#include <stdio.h>
typedef struct {
const header_field_info *hfinfo;
fvalue_t *value;
int proto_layer_num;
} df_reference_t;
/* Passed back to user */
struct epan_dfilter {
GPtrArray *insns;
guint num_registers;
GSList **registers;
gboolean *attempted_load;
GDestroyNotify *free_registers;
int *interesting_fields;
int num_interesting_fields;
GPtrArray *deprecated;
2022-03-27 23:21:53 +00:00
char *expanded_text;
dfilter: Refactor macro tree references This replaces the current macro reference system with a completely different implementation. Instead of a macro a reference is a syntax element. A reference is a constant that can be filled in the dfilter code after compilation from an existing protocol tree. It is best understood as a field value that can be read from a fixed tree that is not the frame being filtered. Usually this fixed tree is the currently selected frame when the filter is applied. This allows comparing fields in the filtered frame with fields in the selected frame. Because the field reference syntax uses the same sigil notation as a macro we have to use a heuristic to distinguish them: if the name has a dot it is a field reference, otherwise it is a macro name. The reference is synctatically validated at compile time. There are two main advantages to this implementation (and a couple of minor ones): The protocol tree for each selected frame is only walked if we have a display filter and if the display filter uses references. Also only the actual reference values are copied, intead of loading the entire tree into a hash table (in textual form even). The other advantage is that the reference is tested like a protocol field against all the values in the selected frame (if there is more than one). Currently the reference fields are not "primed" during dissection, so the entire tree is walked to find a particular reference (this is similar to the previous implementation). If the display filter contains a valid reference and the reference is not loaded at the time the filter is run the result is the same as a non existing field for a regular READ_TREE instruction. Fixes #17599.
2022-03-27 14:26:46 +00:00
GHashTable *references;
GHashTable *raw_references;
char *syntax_tree_str;
dfilter: Allow arithmetic expressions as function arguments This allows writing moderately complex expressions, for example a float epsilon test (#16483): Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01 Syntax tree: 0 TEST_LT: 1 OP_DIVIDE: 2 FUNCTION(abs#1): 3 OP_SUBTRACT: 4 FIELD(_ws.ftypes.double) 4 FVALUE(1 <FT_DOUBLE>) 2 FUNCTION(max#2): 3 FUNCTION(abs#1): 4 FIELD(_ws.ftypes.double) 3 FUNCTION(abs#1): 4 FVALUE(1 <FT_DOUBLE>) 1 FVALUE(0.01 <FT_DOUBLE>) Instructions: 00000 READ_TREE _ws.ftypes.double -> reg#1 00001 IF_FALSE_GOTO 3 00002 SUBRACT reg#1 - 1 <FT_DOUBLE> -> reg#2 00003 STACK_PUSH reg#2 00004 CALL_FUNCTION abs(reg#2) -> reg#0 00005 STACK_POP 1 00006 IF_FALSE_GOTO 24 00007 READ_TREE _ws.ftypes.double -> reg#1 00008 IF_FALSE_GOTO 9 00009 STACK_PUSH reg#1 00010 CALL_FUNCTION abs(reg#1) -> reg#4 00011 STACK_POP 1 00012 IF_FALSE_GOTO 13 00013 STACK_PUSH reg#4 00014 STACK_PUSH 1 <FT_DOUBLE> 00015 CALL_FUNCTION abs(1 <FT_DOUBLE>) -> reg#5 00016 STACK_POP 1 00017 IF_FALSE_GOTO 18 00018 STACK_PUSH reg#5 00019 CALL_FUNCTION max(reg#5, reg#4) -> reg#3 00020 STACK_POP 2 00021 IF_FALSE_GOTO 24 00022 DIVIDE reg#0 / reg#3 -> reg#6 00023 ANY_LT reg#6 < 0.01 <FT_DOUBLE> 00024 RETURN We now use a stack to pass arguments to the function. The stack is implemented as a list of lists (list of registers). Arguments may still be non-existent to functions (this is a feature). Functions must check for nil arguments (NULL lists) and handle that case. It's somewhat complicated to allow literal values and test compatibility for different types, both because of lack of type information with unparsed/literal and also because it is an underdeveloped area in the code. In my limited testing it was good enough and useful, further enhancements are left for future work.
2022-04-16 01:42:20 +00:00
/* Used to pass arguments to functions. List of Lists (list of registers). */
GSList *function_stack;
};
typedef struct {
/* Syntax Tree stuff */
stnode_t *st_root;
gboolean parse_failure;
df_error_t error;
GPtrArray *insns;
GHashTable *loaded_fields;
dfilter: Add support for raw (bytes) addressing mode This adds new syntax to read a field from the tree as bytes, instead of the actual type. This is a useful extension for example to match matformed strings that contain unicode replacement characters. In this case it is not possible to match the raw value of the malformed string field. This extension fills this need and is generic enough that it should be useful in many other situations. The syntax used is to prefix the field name with "@". The following artificial example tests if the HTTP user agent contains a particular invalid UTF-8 sequence: @http.user_agent == "Mozill\xAA" Where simply using "http.user_agent" won't work because the invalid byte sequence will have been replaced with U+FFFD. Considering the following programs: $ dftest '_ws.ftypes.string == "ABC"' Filter: _ws.ftypes.string == "ABC" Syntax tree: 0 TEST_ANY_EQ: 1 FIELD(_ws.ftypes.string <FT_STRING>) 1 FVALUE("ABC" <FT_STRING>) Instructions: 00000 READ_TREE _ws.ftypes.string <FT_STRING> -> reg#0 00001 IF_FALSE_GOTO 3 00002 ANY_EQ reg#0 == "ABC" <FT_STRING> 00003 RETURN $ dftest '@_ws.ftypes.string == "ABC"' Filter: @_ws.ftypes.string == "ABC" Syntax tree: 0 TEST_ANY_EQ: 1 FIELD(_ws.ftypes.string <RAW>) 1 FVALUE(41:42:43 <FT_BYTES>) Instructions: 00000 READ_TREE @_ws.ftypes.string <FT_BYTES> -> reg#0 00001 IF_FALSE_GOTO 3 00002 ANY_EQ reg#0 == 41:42:43 <FT_BYTES> 00003 RETURN In the second case the field has a "raw" type, that equates directly to FT_BYTES, and the field value is read from the protocol raw data.
2022-10-25 03:20:18 +00:00
GHashTable *loaded_raw_fields;
GHashTable *interesting_fields;
int next_insn_id;
int next_register;
GPtrArray *deprecated;
GHashTable *references; /* hfinfo -> pointer to array of references */
GHashTable *raw_references; /* hfinfo -> pointer to array of references */
char *expanded_text;
} dfwork_t;
/*
* State kept by the scanner.
*/
typedef struct {
dfwork_t *dfw;
GString* quoted_string;
gboolean raw_string;
stloc_t string_loc;
stloc_t location;
} df_scanner_state_t;
/* Constructor/Destructor prototypes for Lemon Parser */
void *DfilterAlloc(void* (*)(gsize));
void DfilterFree(void*, void (*)(void *));
void Dfilter(void*, int, stnode_t*, dfwork_t*);
/* Return value for error in scanner. */
#define SCAN_FAILED -1 /* not 0, as that means end-of-input */
void
dfilter_vfail(dfwork_t *dfw, int code, stloc_t *err_loc,
const char *format, va_list args);
void
dfilter_fail(dfwork_t *dfw, int code, stloc_t *err_loc,
const char *format, ...) G_GNUC_PRINTF(4, 5);
WS_NORETURN
void
dfilter_fail_throw(dfwork_t *dfw, int code, stloc_t *err_loc,
const char *format, ...) G_GNUC_PRINTF(4, 5);
void
dfw_set_error_location(dfwork_t *dfw, stloc_t *err_loc);
void
add_deprecated_token(dfwork_t *dfw, const char *token);
void
free_deprecated(GPtrArray *deprecated);
void
DfilterTrace(FILE *TraceFILE, char *zTracePrompt);
dfilter: Add special syntax for literals and names The syntax for protocols and some literals like numbers and bytes/addresses can be ambiguous. Some protocols can be parsed as a literal, for example the protocol "fc" (Fibre Channel) can be parsed as 0xFC. If a numeric protocol is registered that will also take precedence over any literal, according to the current rules, thereby breaking numerical comparisons to that number. The same for an hypothetical protocol named "true", etc. To allow the user to disambiguate this meaning introduce new syntax. Any value prefixed with ':' or enclosed in <,> will be treated as a literal value only. The value :fc or <fc> will always mean 0xFC, under any context. Never a protocol whose filter name is "fc". Likewise any value prefixed with a dot will always be parsed as an identifier (protocol or protocol field) in the language. Never any literal value parsed from the token "fc". This allows the user to be explicit about the meaning, and between the two explicit methods plus the ambiguous one it doesn't completely break any one meaning. The difference can be seen in the following two programs: Filter: frame == fc Constants: Instructions: 00000 READ_TREE frame -> reg#0 00001 IF-FALSE-GOTO 5 00002 READ_TREE fc -> reg#1 00003 IF-FALSE-GOTO 5 00004 ANY_EQ reg#0 == reg#1 00005 RETURN -------- Filter: frame == :fc Constants: 00000 PUT_FVALUE fc <FT_PROTOCOL> -> reg#1 Instructions: 00000 READ_TREE frame -> reg#0 00001 IF-FALSE-GOTO 3 00002 ANY_EQ reg#0 == reg#1 00003 RETURN The filter "frame == fc" is the same as "filter == .fc", according to the current heuristic, except the first form will try to parse it as a literal if the name does not correspond to any registered protocol. By treating a leading dot as a name in the language we necessarily disallow writing floats with a leading dot. We will also disallow writing with an ending dot when using unparsed values. This is a backward incompatibility but has the happy side effect of making the expression {1...2} unambiguous. This could either mean "1 .. .2" or "1. .. 2". If we require a leading and ending digit then the meaning is clear: 1.0..0.2 -> 1.0 .. 0.2 Fixes #17731.
2022-02-22 21:55:05 +00:00
header_field_info *
dfilter_resolve_unparsed(dfwork_t *dfw, const char *name);
WS_RETNONNULL fvalue_t*
dfilter_fvalue_from_literal(dfwork_t *dfw, ftenum_t ftype, stnode_t *st,
gboolean allow_partial_value, header_field_info *hfinfo_value_string);
WS_RETNONNULL fvalue_t *
dfilter_fvalue_from_string(dfwork_t *dfw, ftenum_t ftype, stnode_t *st,
header_field_info *hfinfo_value_string);
WS_RETNONNULL fvalue_t *
dfilter_fvalue_from_charconst(dfwork_t *dfw, ftenum_t ftype, stnode_t *st);
const char *tokenstr(int token);
df_reference_t *
reference_new(const field_info *finfo, gboolean raw);
void
reference_free(df_reference_t *ref);
void dfw_error_init(df_error_t *err);
void dfw_error_clear(df_error_t *err);
void dfw_error_set_msg(df_error_t **errp, const char *fmt, ...)
G_GNUC_PRINTF(2, 3);
void dfw_error_take(df_error_t **errp, df_error_t *src);
#endif