wireshark

Commit Graph

Author	SHA1	Message	Date
João Valverde	2443df7318	Disable another -Wunreachable lemon warning	2022-11-17 11:21:41 +00:00
João Valverde	b83658d8a4	dfilter: Add suport for raw addressing with references Extends raw adressing syntax to wok with references. The syntax is @field1 == ${@field2} This requires replicating the logic to load field references, but using raw values instead. We use separate hash tables for that, namely "references" vs "raw_references".	2022-10-31 21:02:39 +00:00
João Valverde	0853ddd1cb	dfilter: Add support for raw (bytes) addressing mode This adds new syntax to read a field from the tree as bytes, instead of the actual type. This is a useful extension for example to match matformed strings that contain unicode replacement characters. In this case it is not possible to match the raw value of the malformed string field. This extension fills this need and is generic enough that it should be useful in many other situations. The syntax used is to prefix the field name with "@". The following artificial example tests if the HTTP user agent contains a particular invalid UTF-8 sequence: @http.user_agent == "Mozill\xAA" Where simply using "http.user_agent" won't work because the invalid byte sequence will have been replaced with U+FFFD. Considering the following programs: $ dftest '_ws.ftypes.string == "ABC"' Filter: _ws.ftypes.string == "ABC" Syntax tree: 0 TEST_ANY_EQ: 1 FIELD(_ws.ftypes.string <FT_STRING>) 1 FVALUE("ABC" <FT_STRING>) Instructions: 00000 READ_TREE _ws.ftypes.string <FT_STRING> -> reg#0 00001 IF_FALSE_GOTO 3 00002 ANY_EQ reg#0 == "ABC" <FT_STRING> 00003 RETURN $ dftest '@_ws.ftypes.string == "ABC"' Filter: @_ws.ftypes.string == "ABC" Syntax tree: 0 TEST_ANY_EQ: 1 FIELD(_ws.ftypes.string <RAW>) 1 FVALUE(41:42:43 <FT_BYTES>) Instructions: 00000 READ_TREE @_ws.ftypes.string <FT_BYTES> -> reg#0 00001 IF_FALSE_GOTO 3 00002 ANY_EQ reg#0 == 41:42:43 <FT_BYTES> 00003 RETURN In the second case the field has a "raw" type, that equates directly to FT_BYTES, and the field value is read from the protocol raw data.	2022-10-31 21:02:39 +00:00
João Valverde	fc5c81328e	dfilter: Rename test syntax tree node Test node also includes arithmetic operations so rename it to a generic "operator" node.	2022-07-02 11:39:17 +01:00
João Valverde	b10db887ce	dfilter: Remove unparsed syntax type and RHS literal bias This removes unparsed name resolution during the semantic check because it feels like a hack to work around limitations in the language syntax, that should be solved at the lexical level instead. We were interpreting unparsed differently on the LHS and RHS. Now an unparsed value is always a field if it matches a registered field name (this matches the implementation in 3.6 and before). This requires tightening a bit the allowed filter names for protocols to avoid some common and potentially weird conflicting cases. Incidentally this extends set grammar to accept all entities. That is experimental and may be reverted in the future.	2022-07-02 11:18:20 +01:00
João Valverde	efbe699756	dfilter: Remove STTYPE_RANGE_NODE STTYPE_RANGE_NODE is just a lexical token, it is not used withi the syntax tree so remove it.	2022-06-25 16:06:48 +01:00
João Valverde	aaff0d21ae	dfilter: Add layer support for references This adds support for using the layers filter with field references. Before: $ dftest 'ip.src != ${ip.src#2}' dftest: invalid character in macro name After: $ dftest 'ip.src != ${ip.src#2}' Filter: ip.src != ${ip.src#2} Syntax tree: 0 TEST_ALL_NE: 1 FIELD(ip.src <FT_IPv4>) 1 REFERENCE(ip.src#[2:1] <FT_IPv4>) Instructions: 00000 READ_TREE ip.src <FT_IPv4> -> reg#0 00001 IF_FALSE_GOTO 5 00002 READ_REFERENCE_R ${ip.src <FT_IPv4>} #[2:1] -> reg#1 00003 IF_FALSE_GOTO 5 00004 ALL_NE reg#0 != reg#1 00005 RETURN This requires adding another level of complexity to references. When loading references we need to copy the 'proto_layer_num' and add the logic to filter on that. The "layer" sttype is removed and replace by a new field sttype with support for a range. This is a nice cleanup for the semantic check and general simplification. The grammar is better too with this design. Range sttype is renamed to slice for clarity.	2022-06-25 14:57:40 +01:00
João Valverde	b602911b31	dfilter: Add support for universal quantifiers Adds the keywords "any" and "all" to implement the quantification to any existing relational operator. Filter: all tcp.port in {100, 2000..3000} Syntax tree: 0 ALL TEST_IN: 1 FIELD(tcp.port) 1 SET(#2): 2 FVALUE(100 <FT_UINT16>) 2 FVALUE(2000 <FT_UINT16>) .. FVALUE(3000 <FT_UINT16>) Instructions: 00000 READ_TREE tcp.port -> reg#0 00001 IF_FALSE_GOTO 5 00002 ALL_EQ reg#0 === 100 <FT_UINT16> 00003 IF_TRUE_GOTO 5 00004 ALL_IN_RANGE reg#0 in { 2000 <FT_UINT16> .. 3000 <FT_UINT16> } 00005 RETURN	2022-05-12 14:26:54 +01:00
João Valverde	4f3f507eee	dfilter: Add syntax to match specific layers in the protocol stack Add support to display filters for matching a specific layer within a frame. Layers are counted sequentially up the protocol stack. Each protocol (dissector) that appears in the stack is one layer. LINK-LAYER#1 <-> IP#1 <-> TCP#1 <-> IP#2 <-> TCP#2 <-> etc. The syntax allows for negative indexes and ranges with the usual semantics for slices (but note that counting starts at one): tcp.port#[2-4] == 1024 Matches layers 2 to 4 inclusive. Fixes #3791.	2022-04-26 16:50:59 +00:00
João Valverde	c0170dad42	dfilter: Rename "range" to "slice" The word range is used for different things with different meanings and that is confusing. Avoid using "range" in code to mean "slice". A range is one or more intervals with a lower and upper bound. A slice is a range applied to a bytes field. Replace range with slice wherever appropriate. This usage of "slice" instead of range is generally correct and consistent in the documentation.	2022-04-26 16:50:59 +00:00
João Valverde	fab32ea0cb	dfilter: Allow arithmetic expressions as function arguments This allows writing moderately complex expressions, for example a float epsilon test (#16483): Filter: {abs(_ws.ftypes.double - 1) / max(abs(_ws.ftypes.double), abs(1))} < 0.01 Syntax tree: 0 TEST_LT: 1 OP_DIVIDE: 2 FUNCTION(abs#1): 3 OP_SUBTRACT: 4 FIELD(_ws.ftypes.double) 4 FVALUE(1 <FT_DOUBLE>) 2 FUNCTION(max#2): 3 FUNCTION(abs#1): 4 FIELD(_ws.ftypes.double) 3 FUNCTION(abs#1): 4 FVALUE(1 <FT_DOUBLE>) 1 FVALUE(0.01 <FT_DOUBLE>) Instructions: 00000 READ_TREE _ws.ftypes.double -> reg#1 00001 IF_FALSE_GOTO 3 00002 SUBRACT reg#1 - 1 <FT_DOUBLE> -> reg#2 00003 STACK_PUSH reg#2 00004 CALL_FUNCTION abs(reg#2) -> reg#0 00005 STACK_POP 1 00006 IF_FALSE_GOTO 24 00007 READ_TREE _ws.ftypes.double -> reg#1 00008 IF_FALSE_GOTO 9 00009 STACK_PUSH reg#1 00010 CALL_FUNCTION abs(reg#1) -> reg#4 00011 STACK_POP 1 00012 IF_FALSE_GOTO 13 00013 STACK_PUSH reg#4 00014 STACK_PUSH 1 <FT_DOUBLE> 00015 CALL_FUNCTION abs(1 <FT_DOUBLE>) -> reg#5 00016 STACK_POP 1 00017 IF_FALSE_GOTO 18 00018 STACK_PUSH reg#5 00019 CALL_FUNCTION max(reg#5, reg#4) -> reg#3 00020 STACK_POP 2 00021 IF_FALSE_GOTO 24 00022 DIVIDE reg#0 / reg#3 -> reg#6 00023 ANY_LT reg#6 < 0.01 <FT_DOUBLE> 00024 RETURN We now use a stack to pass arguments to the function. The stack is implemented as a list of lists (list of registers). Arguments may still be non-existent to functions (this is a feature). Functions must check for nil arguments (NULL lists) and handle that case. It's somewhat complicated to allow literal values and test compatibility for different types, both because of lack of type information with unparsed/literal and also because it is an underdeveloped area in the code. In my limited testing it was good enough and useful, further enhancements are left for future work.	2022-04-18 17:10:31 +01:00
João Valverde	4d9470e7dd	dfilter: Add location tracking to scanner and use it to report errors Add location tracking as a column offset and length from offset to the scanner. Our input is a single line only so we don't need to track line offset. Record that information in the syntax tree. Return the error location in dfilter_compile(). Use it in dftest to mark the location of the error in the filter string. Later it would be nice to use the location in the GUI as well. $ dftest "ip.proto == aaaaaa and tcp.port == 123" Filter: ip.proto == aaaaaa and tcp.port == 123 dftest: "aaaaaa" cannot be found among the possible values for ip.proto. ip.proto == aaaaaa and tcp.port == 123 ^~~~~~	2022-04-10 10:09:51 +01:00
João Valverde	da19379eb5	dfilter: Create the syntax node in the scanner and pass that Revert to passing a syntax node from the lexical scanner to the grammar parser. Using a union is not having a discernible advantage and requires duplicating a lot of properties of syntax nodes.	2022-04-10 09:54:03 +01:00
João Valverde	fb9a176587	dfilter: Allow grouping arithmetical expressions with { } This removes the limitation of having only two terms in an arithmetic expression and allows setting the precedence using curly braces (like any basic calculator). Our grammar currently does not allow grouping arithmetic expressions using parenthesis, because boolean expressions and arithmetic expressions are different and parenthesis are used with the former.	2022-04-08 23:12:04 +01:00
João Valverde	8fb28f5161	dfilter: Minor grammar cleanup Remove duplication for arithmetic expressions.	2022-04-05 12:04:37 +01:00
João Valverde	20afbd46ec	dfilter: Remove existence test syntax tree nodes After some experimentation I don't think these two existence tests belong in the grammar, it's an implementation detail and removing it might avoid some artificial constraints.	2022-04-05 12:04:37 +01:00
João Valverde	fb08c4b4a8	dfilter: Replace bitwise sttype with arithmetic Most of the bitwise codepaths are just duplicating code for the arithmetic type. Parse bitwise expressions as arithmetic instead.	2022-04-05 12:04:37 +01:00
João Valverde	34ad6bb478	dfilter: Make logical AND higher precedence than logical OR In most, if not all, programming languages logical AND has higher precedence than logical OR. Apply the principle of least surprise and do the same for Wireshark display filters. Before: ip and tcp or udp => ip and (tcp or udp) Filter: ip and tcp or udp Instructions: 00000 CHECK_EXISTS ip 00001 IF_FALSE_GOTO 5 00002 CHECK_EXISTS tcp 00003 IF_TRUE_GOTO 5 00004 CHECK_EXISTS udp 00005 RETURN After: ip and tcp or udp => (ip and tcp) or udp Filter: ip and tcp or udp Instructions: 00000 CHECK_EXISTS ip 00001 IF_FALSE_GOTO 4 00002 CHECK_EXISTS tcp 00003 IF_TRUE_GOTO 5 00004 CHECK_EXISTS udp 00005 RETURN	2022-04-04 19:51:38 +00:00
João Valverde	8bc214b5bb	dfilter: Add remaining arithmetic integer ops	2022-03-31 16:49:42 +01:00
João Valverde	2a9cb588aa	dfilter: Add binary arithmetic (add/subtract) Add support for display filter binary addition and subtraction. The grammar is intentionally kept simple for now. The use case is to add a constant to a protocol field, or (maybe) add two fields in an expression. We use signed arithmetic with unsigned numbers, checking for overflow and casting where necessary to do the conversion. We could legitimately opt to use traditional modular arithmetic instead (like C) and if it turns out that that is more useful for some reason we may want to in the future. Fixes #15504.	2022-03-31 11:27:34 +01:00
João Valverde	260942e170	dfilter: Refactor macro tree references This replaces the current macro reference system with a completely different implementation. Instead of a macro a reference is a syntax element. A reference is a constant that can be filled in the dfilter code after compilation from an existing protocol tree. It is best understood as a field value that can be read from a fixed tree that is not the frame being filtered. Usually this fixed tree is the currently selected frame when the filter is applied. This allows comparing fields in the filtered frame with fields in the selected frame. Because the field reference syntax uses the same sigil notation as a macro we have to use a heuristic to distinguish them: if the name has a dot it is a field reference, otherwise it is a macro name. The reference is synctatically validated at compile time. There are two main advantages to this implementation (and a couple of minor ones): The protocol tree for each selected frame is only walked if we have a display filter and if the display filter uses references. Also only the actual reference values are copied, intead of loading the entire tree into a hash table (in textual form even). The other advantage is that the reference is tested like a protocol field against all the values in the selected frame (if there is more than one). Currently the reference fields are not "primed" during dissection, so the entire tree is walked to find a particular reference (this is similar to the previous implementation). If the display filter contains a valid reference and the reference is not loaded at the time the filter is run the result is the same as a non existing field for a regular READ_TREE instruction. Fixes #17599.	2022-03-29 12:36:31 +00:00
João Valverde	431cb43b81	dfilter: Remove parenthesis deprecation warning This usage devalues a mechanism for warning users that deserves more attention than this minor suggestion. The warning is inconvenient for intermediate and advanced users.	2022-03-29 12:19:26 +00:00
João Valverde	ac0a69636b	dfilter: Add support for unary arithmetic This change implements a unary minus operator. Filter: tcp.window_size_scalefactor == -tcp.dstport Instructions: 00000 READ_TREE tcp.window_size_scalefactor -> reg#0 00001 IF_FALSE_GOTO 6 00002 READ_TREE tcp.dstport -> reg#1 00003 IF_FALSE_GOTO 6 00004 MK_MINUS -reg#1 -> reg#2 00005 ANY_EQ reg#0 == reg#2 00006 RETURN It is supported for integer types, floats and relative time values. The unsigned integer types are promoted to a 32 bit signed integer. Unary plus is implemented as a no-op. The plus sign is simply ignored. Constant arithmetic expressions are computed during compilation. Overflow with constants is a compile time error. Overflow with variables is a run time error and silently ignored. Only a debug message will be printed to the console. Related to #15504.	2022-03-28 11:20:41 +00:00
João Valverde	16729be2c1	dfilter: Add bitwise masking of bits Add support for masking of bits. Before the bitwise operator could only test bits, it did not support clearing bits. This allows testing if any combination of bits are set/unset more naturally with a single test. Previously this was only possible by combining several bitwise predicates. Bitwise is implemented as a test node, even though it is not. Maybe the test node should be renamed to something else. Fixes #17246.	2022-03-22 12:58:04 +00:00
João Valverde	a68b408a9f	dfilter: Add RHS bias for literal values For unparsed values on the RHS of a comparison try to parse them first as a literal and only then as a protocol. This is more complicated in code but should be a use case a lot more common and useful in practice. It removes some annoying special cases and applies this rule consistently to any expression. Consistency is important otherwise the special cases and exceptions make the language confusing and difficult to learn. For values on the LHS the rule remains to first try a protocol value, then a literal. Related with issue #17731.	2022-03-05 11:10:54 +00:00
João Valverde	c4f9d8abda	dfilter: Rename "unparsed" to "literal" A literal value is a value that cannot be interpreted as a registered protocol. An unparsed value can be a literal or an identifier (protocol/field) according to context and the current disambiguation rules. Strictly literal here is to be understood to mean "numeric literal, including numeric arrays, but not strings or character constants".	2022-03-05 11:10:54 +00:00
João Valverde	6d520addd1	dfilter: Add special syntax for literals and names The syntax for protocols and some literals like numbers and bytes/addresses can be ambiguous. Some protocols can be parsed as a literal, for example the protocol "fc" (Fibre Channel) can be parsed as 0xFC. If a numeric protocol is registered that will also take precedence over any literal, according to the current rules, thereby breaking numerical comparisons to that number. The same for an hypothetical protocol named "true", etc. To allow the user to disambiguate this meaning introduce new syntax. Any value prefixed with ':' or enclosed in <,> will be treated as a literal value only. The value :fc or <fc> will always mean 0xFC, under any context. Never a protocol whose filter name is "fc". Likewise any value prefixed with a dot will always be parsed as an identifier (protocol or protocol field) in the language. Never any literal value parsed from the token "fc". This allows the user to be explicit about the meaning, and between the two explicit methods plus the ambiguous one it doesn't completely break any one meaning. The difference can be seen in the following two programs: Filter: frame == fc Constants: Instructions: 00000 READ_TREE frame -> reg#0 00001 IF-FALSE-GOTO 5 00002 READ_TREE fc -> reg#1 00003 IF-FALSE-GOTO 5 00004 ANY_EQ reg#0 == reg#1 00005 RETURN -------- Filter: frame == :fc Constants: 00000 PUT_FVALUE fc <FT_PROTOCOL> -> reg#1 Instructions: 00000 READ_TREE frame -> reg#0 00001 IF-FALSE-GOTO 3 00002 ANY_EQ reg#0 == reg#1 00003 RETURN The filter "frame == fc" is the same as "filter == .fc", according to the current heuristic, except the first form will try to parse it as a literal if the name does not correspond to any registered protocol. By treating a leading dot as a name in the language we necessarily disallow writing floats with a leading dot. We will also disallow writing with an ending dot when using unparsed values. This is a backward incompatibility but has the happy side effect of making the expression {1...2} unambiguous. This could either mean "1 .. .2" or "1. .. 2". If we require a leading and ending digit then the meaning is clear: 1.0..0.2 -> 1.0 .. 0.2 Fixes #17731.	2022-03-05 11:10:54 +00:00
João Valverde	8b23dd3a3c	dfilter: Add an "all equal" operator To complete the set of equality operators add an "all equal" operator that matches a frame if all fields match the condition. The symbol chosen for "all_eq" is "===".	2021-12-22 14:32:32 +00:00
João Valverde	f5f8d9ebb6	dfilter: Fix token associativity TEST_EQ and TEST_NE are unused. Replace by the correct values and add missing token to string representations.	2021-12-13 01:24:18 +00:00
João Valverde	60e305d1e1	dfilter: Convert grammar.lemon to 4-space indentation Add global EditorConfig settings for lemon files. Add exceptions for the two grammar files that use tab indentation.	2021-12-02 15:48:40 +00:00
João Valverde	3657788cbb	dfilter: Add default grammar type	2021-12-01 19:43:30 +00:00
João Valverde	647decd509	dfilter: Avoid double strdup to save token value Store the lval token value instead.	2021-12-01 19:42:51 +00:00
João Valverde	557cee31fc	dfilter: Save lexical token value to syntax tree Use that for error messages, including any using test operators. This allows to always use the same name as the user. It avoids cases where the user write "a && b" and the message is "a and b" is syntactically invalid. It should also allow us to be more consistent with the use of double quotes.	2021-12-01 13:34:01 +00:00
João Valverde	3e0806ca09	dfilter: Remove dfilter_fail_parse() Instead of requiring a special error function in the parser just set the syntax_error flag if an error occurs, in any stage of compilation. Outside of the parser loop it will not be used but that is fine.	2021-11-30 19:52:05 +00:00
João Valverde	943c282009	dfilter: Parse character constants in lexer Invalid character constants should be handled in the lexical scanner. Todo: See if some code could be shared to parse double quoted strings. It also fixes some unintuitive type coercions to string. Character constants should be treated as characters, or maybe integers, or maybe even throw an invalid comparison error, but coverting to a literal string or byte array is surprising and not particularly useful: '\xFF' -> "'\xFF'" (equals) '\xFF' -> "FF" (contains) Before: Filter: http.request.method contains "\x63" Constants: 00000 PUT_FVALUE "c" <FT_STRING> -> reg#1 (...) Filter: http.request.method contains '\x63' Constants: 00000 PUT_FVALUE "63" <FT_STRING> -> reg#1 (...) Filter: http.request.method == "\x63" Constants: 00000 PUT_FVALUE "c" <FT_STRING> -> reg#1 (...) Filter: http.request.method == '\x63' Constants: 00000 PUT_FVALUE "'\\x63'" <FT_STRING> -> reg#1 (...) After: Filter: http.request.method contains '\x63' Constants: 00000 PUT_FVALUE "c" <FT_STRING> -> reg#1 (...) Filter: http.request.method == '\x63' Constants: 00000 PUT_FVALUE "c" <FT_STRING> -> reg#1 (...)	2021-11-24 08:40:20 +00:00
João Valverde	7028646f9e	dfilter: Fix invalid character constant error message This reverts commit `d635ff4933`. A charconst cannot be a value string, for that reason it is not redundant with unparsed. Maybe character constants should be parsed in the lexical scanner instead. Before: Filter: ip.proto == '\g' dftest: "'\g'" cannot be found among the possible values for ip.proto. After: Filter: ip.proto == '\g' dftest: "'\g'" isn't a valid character constant.	2021-11-23 17:35:40 +00:00
João Valverde	77fa0fb23d	dfilter: Fixup unexpected end of filter error message Fixes `e7ecc9b9e5`.	2021-11-14 15:33:56 +00:00
João Valverde	e7ecc9b9e5	dfilter: Clean up error format and exception code Misc code cleanups. Add some extra stnode functions for increased type safety. Fix a constness issue with df_lval_value().	2021-11-10 03:18:50 +00:00
João Valverde	146a840ad1	dfilter: Move a constructor to the grammar file	2021-11-06 11:45:21 +00:00
João Valverde	6823073f7e	dfilter: Fix corner case with matches Previously a chained expression like "a == b matches c" would be considered syntactically valid. Fix that to be a syntax error. Only math-like comparisons can be chained. This also disallows chaining contains expressions.	2021-11-06 11:45:21 +00:00
João Valverde	fb490eb172	dfilter: Move regex creation to semcheck	2021-11-06 11:45:21 +00:00
João Valverde	d635ff4933	dfilter: Remove redundant STTYPE_CHARCONST syntax node A charconst uses the same semantic rules as unparsed so just use the latter to avoid redundancies. We keep the use of TOKEN_CHARCONST as an optimization to avoid an unnecessary name resolution (lookup for a registered field with the same name as the charconst).	2021-10-31 20:33:31 +00:00
João Valverde	f78ebe1564	dfilter: Remove deprecated support for whitespace separator in sets	2021-10-31 09:13:18 +00:00
João Valverde	2183738ef2	dfilter: Add support for comma as set separator Deprecate the usage of significant whitespace to separate set elements (or anywhere else for that matter). This will make the implementation simpler and cleaner and the language more expressive and user-friendly.	2021-10-28 04:11:05 +00:00
João Valverde	31d04f9ee7	dfilter: Add synctatic sugar for "not in" test	2021-10-27 20:52:35 +00:00
João Valverde	74a89a9862	dfilter: Minor set grammar cleanup	2021-10-27 11:13:52 +01:00
João Valverde	a7c625808c	dfilter: Add a helper function to create test stnodes	2021-10-27 09:27:45 +01:00
João Valverde	f5fea52982	dfilter: Remove token value from syntax tree Currently unused. This might still be useful to differentiate different spelling of the same token in user messages, like "==" and "eq", but currently we are not storing test tokens anyway, so just remove it, it makes everything simpler. If it's ever necessary it can be added back.	2021-10-27 09:27:45 +01:00
João Valverde	0e4851b025	dfilter: Use a string lval type in scanner Minor change to decouple the AST data structures from the lexical scanner. We pass a structure to allow for some future enhancements.	2021-10-27 09:27:45 +01:00
João Valverde	b1222edcd2	dfilter: Parse ranges in the drange node constructor Using a hand written tokenizer is simpler than using flex start conditions. Do the validation in the drange node constructor. Add validation for malformed ranges with different endpoint signs.	2021-10-27 06:02:07 +00:00

1 2

100 Commits