dfilter: Fix "!=" relation to be free of contradictions

Wireshark defines the relation of equality A == B as A any_eq B <=> An == Bn for at least one An, Bn. More accurately I think this is (formally) an equivalence relation, not true equality. Whichever definition for "==" we choose we must keep the definition of "!=" as !(A == B), otherwise it will lead to logical contradictions like (A == B) AND (A != B) being true. Fix the '!=' relation to match the definition of equality: A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for every n. This has been the recomended way to write "not equal" for a long time in the documentation, even to the point where != was deprecated, but it just wasn't implemented consistently in the language, which has understandably been a persistent source of confusion. Even a field that is normally well-behaved with "!=" like "ip.src" or "ip.dst" will produce unexpected results with encapsulations like IP-over-IP. The opcode ALL_NE could have been implemented in the compiler instead using NOT and ANY_EQ but I chose to implement it in bytecode. It just seemed more elegant and efficient but the difference was not very significant. Keep around "~=" for any_ne relation, in case someone depends on that, and because we don't have an operator for true equality: A strict_equal B <=> A all_eq B <=> !(A any_ne B). If there is only one value then any_ne and all_ne are the same comparison operation. Implementing this change did not require fixing any tests so it is unlikely the relation "~=" (any_ne) will be very useful. Note that the behaviour of the '<' (less than) comparison relation is a separate, more subtle issue. In the general case the definition of '<' that is used is only a partial order.
2021-10-18 21:07:06 +01:00 · 2021-10-18 21:07:06 +01:00 · 0abe10e040
parent d2b249a445
commit 0abe10e040
15 changed files with 143 additions and 74 deletions
--- a/docbook/release-notes.adoc
+++ b/docbook/release-notes.adoc
@ -110,6 +110,10 @@ The following features are new (or have been significantly updated) since versio
   For example "http.user_agent contains aa:bb" tries to match "aa:bb". Avoid this usage, always use double-quotes: http.user_agent contains "\xaa\xbb".
 ** Regular expressions (using "matches" or "~") must be specified using character strings. It is a syntax error to omit the double-quotes around
   the regular expression. Before the syntax rules of an unquoted regex string could be difficult to predict.
+** The expression "a != b" now always has the same meaning as !(a == b). In particular this means filter expressions with multi-value fields like
+   "ip.addr != 1.1.1.1" will work as expected (the result is the same as typing "ip.src != 1.1.1.1 and ip.dst != 1.1.1.1"). This avoids the
+   contradiction (a == b and a != b) being true.
+** Use the syntax "a ~= b" or "a any_ne b" to recover the previous (inconsistent with ==) logic for not equal.

 * Corrected calculation of mean jitter in RTP Stream Analysis dialog and IAX2 Stram Analysis dialog

--- a/docbook/wsug_src/WSUG_chapter_use.adoc
+++ b/docbook/wsug_src/WSUG_chapter_use.adoc
@ -1155,10 +1155,10 @@ You can become more familiar with display filter fields by selecting different p

 [[ChUseWiresharkStatusbarFilter]]

+//FIXME: Remove or choose a better example of a display filter message.
 .The Statusbar with a display filter message
 image::wsug_graphics/ws-statusbar-filter.png[{statusbar-attrs}]

 This is displayed if you are trying to use a display filter which may have unexpected results.
-For a detailed description see <<ChWorkBuildDisplayFilterMistake>>.

 // End of WSUG Chapter 3
--- a/docbook/wsug_src/WSUG_chapter_work.adoc
+++ b/docbook/wsug_src/WSUG_chapter_work.adoc
@ -804,34 +804,6 @@ To match IP addresses ending in 255 in a block of subnets (172.16 to 172.31):
 string(ip.dst) matches r"^172\.(1[6-9]|2[0-9]|3[0-1])\.[0-9]{1,3}\.255"
 ----

-[[ChWorkBuildDisplayFilterMistake]]
-
-==== A Common Mistake with !=
-
-Using the != operator on combined expressions like `eth.addr`, `ip.addr`,
-`tcp.port`, and `udp.port` will probably not work as expected. Wireshark
-will show the warning “"!=" may have unexpected results” when you use it.
-
-People often use a filter string like `ip.addr == 1.2.3.4`
-to display all packets containing the IP address 1.2.3.4.
-
-Then they use `ip.addr != 1.2.3.4` expecting to see all packets not containing the IP
-address 1.2.3.4 in it. Unfortunately, this does _not_ do the expected.
-
-Instead, that expression will even be true for packets where either the source or
-destination IP address equals 1.2.3.4. The reason for this is because the
-expression `ip.addr != 1.2.3.4` is read as “the packet contains a field
-named ip.addr with a value different from 1.2.3.4”. As an IP datagram contains
-both a source and a destination address, the expression will evaluate to true
-whenever at least one of the two addresses differs from 1.2.3.4.
-
-If you want to filter out all packets containing IP datagrams to or from IP
-address 1.2.3.4, then the correct filter is `!(ip.addr == 1.2.3.4)` as it is read
-“show me all the packets for which it is not true that a field named ip.addr
-exists with a value of 1.2.3.4”, or in other words, “filter out all packets
-for which there are no occurrences of a field named ip.addr with the value
-1.2.3.4”.
-
 [[ChWorkBuildDisplayFilterTransitional]]

 ==== Sometimes Fields Change Names
--- a/epan/dfilter/dfvm.c
+++ b/epan/dfilter/dfvm.c
@ -120,6 +120,7 @@ dfvm_dump(FILE *f, dfilter_t *df)
 			case CALL_FUNCTION:
 			case MK_RANGE:
 			case ANY_EQ:
+			case ALL_NE:
 			case ANY_NE:
 			case ANY_GT:
 			case ANY_GE:
@ -228,6 +229,11 @@ dfvm_dump(FILE *f, dfilter_t *df)
 					id, arg1->value.numeric, arg2->value.numeric);
 				break;

+			case ALL_NE:
+				fprintf(f, "%05d ALL_NE\t\treg#%u == reg#%u\n",
+					id, arg1->value.numeric, arg2->value.numeric);
+				break;
+
 			case ANY_NE:
 				fprintf(f, "%05d ANY_NE\t\treg#%u == reg#%u\n",
 					id, arg1->value.numeric, arg2->value.numeric);
@ -372,26 +378,74 @@ put_pcre(dfilter_t *df, fvalue_regex_t *pcre, int reg)
 	return TRUE;
 }

-typedef gboolean (*FvalueCmpFunc)(const fvalue_t*, const fvalue_t*);
+enum match_how {
+	MATCH_ANY,
+	MATCH_ALL
+};
+
+typedef gboolean (*DFVMMatchFunc)(const fvalue_t*, const fvalue_t*);

 static gboolean
-any_test(dfilter_t *df, FvalueCmpFunc cmp, int reg1, int reg2)
+cmp_test(dfilter_t *df, enum match_how how, DFVMMatchFunc match_func, int reg1, int reg2)
 {
 	GList	*list_a, *list_b;
+	gboolean want_all = (how == MATCH_ALL);
+	gboolean want_any = (how == MATCH_ANY);
+	gboolean have_match;

 	list_a = df->registers[reg1];

 	while (list_a) {
 		list_b = df->registers[reg2];
 		while (list_b) {
-			if (cmp((fvalue_t *)list_a->data, (fvalue_t *)list_b->data)) {
+			have_match = match_func(list_a->data, list_b->data);
+			if (want_all && !have_match) {
+				return FALSE;
+			}
+			else if (want_any && have_match) {
 				return TRUE;
 			}
 			list_b = g_list_next(list_b);
 		}
 		list_a = g_list_next(list_a);
 	}
-	return FALSE;
+	/* want_all || !want_any */
+	return want_all;
+}
+
+static inline gboolean
+any_test(dfilter_t *df, DFVMMatchFunc cmp, int reg1, int reg2)
+{
+	/* cmp(A) <=> cmp(a1) OR cmp(a2) OR cmp(a3) OR ... */
+	return cmp_test(df, MATCH_ANY, cmp, reg1, reg2);
+}
+
+static inline gboolean
+all_test(dfilter_t *df, DFVMMatchFunc cmp, int reg1, int reg2)
+{
+	/* cmp(A) <=> cmp(a1) AND cmp(a2) AND cmp(a3) AND ... */
+	return cmp_test(df, MATCH_ALL, cmp, reg1, reg2);
+}
+
+static inline gboolean
+any_eq(dfilter_t *df, int reg1, int reg2)
+{
+	/* A any_eq B <=> a1 == b1 OR a2 == b2 OR a3 == b3 OR ... */
+	return any_test(df, fvalue_eq, reg1, reg2);
+}
+
+static inline gboolean
+any_ne(dfilter_t *df, int reg1, int reg2)
+{
+	/* A any_ne B <=> a1 != b1 OR a2 != b2 OR a3 != b3 OR ... */
+	return any_test(df, fvalue_ne, reg1, reg2);
+}
+
+static inline gboolean
+all_ne(dfilter_t *df, int reg1, int reg2)
+{
+	/* A all_ne B <=> a1 != b1 AND a2 != b2 AND a3 != b3 AND ... */
+	return all_test(df, fvalue_ne, reg1, reg2);
 }

 static gboolean
@ -573,13 +627,15 @@ dfvm_apply(dfilter_t *df, proto_tree *tree)
 				break;

 			case ANY_EQ:
-				accum = any_test(df, fvalue_eq,
-						arg1->value.numeric, arg2->value.numeric);
+				accum = any_eq(df, arg1->value.numeric, arg2->value.numeric);
+				break;
+
+			case ALL_NE:
+				accum = all_ne(df, arg1->value.numeric, arg2->value.numeric);
 				break;

 			case ANY_NE:
-				accum = any_test(df, fvalue_ne,
-						arg1->value.numeric, arg2->value.numeric);
+				accum = any_ne(df, arg1->value.numeric, arg2->value.numeric);
 				break;

 			case ANY_GT:
@ -702,6 +758,7 @@ dfvm_init_const(dfilter_t *df)
 			case CALL_FUNCTION:
 			case MK_RANGE:
 			case ANY_EQ:
+			case ALL_NE:
 			case ANY_NE:
 			case ANY_GT:
 			case ANY_GE:
--- a/epan/dfilter/dfvm.h
+++ b/epan/dfilter/dfvm.h
@ -53,6 +53,7 @@ typedef enum {
 	PUT_FVALUE,
 	PUT_PCRE,
 	ANY_EQ,
+	ALL_NE,
 	ANY_NE,
 	ANY_GT,
 	ANY_GE,
--- a/epan/dfilter/gencode.c
+++ b/epan/dfilter/gencode.c
@ -507,11 +507,15 @@ gen_test(dfwork_t *dfw, stnode_t *st_node)
 			val1->value.numeric = dfw->next_insn_id;
 			break;

-		case TEST_OP_EQ:
+		case TEST_OP_ANY_EQ:
 			gen_relation(dfw, ANY_EQ, st_arg1, st_arg2);
 			break;

-		case TEST_OP_NE:
+		case TEST_OP_ALL_NE:
+			gen_relation(dfw, ALL_NE, st_arg1, st_arg2);
+			break;
+
+		case TEST_OP_ANY_NE:
 			gen_relation(dfw, ANY_NE, st_arg1, st_arg2);
 			break;

--- a/epan/dfilter/grammar.lemon
+++ b/epan/dfilter/grammar.lemon
@ -264,8 +264,9 @@ range_node(D) ::= INTEGER(X).
 	stnode_free(X);
 }

-rel_binop(O) ::= TEST_EQ.	{ O = TEST_OP_EQ; }
-rel_binop(O) ::= TEST_NE.	{ O = TEST_OP_NE; }
+rel_binop(O) ::= TEST_ANY_EQ.	{ O = TEST_OP_ANY_EQ; }
+rel_binop(O) ::= TEST_ALL_NE.	{ O = TEST_OP_ALL_NE; }
+rel_binop(O) ::= TEST_ANY_NE.	{ O = TEST_OP_ANY_NE; }
 rel_binop(O) ::= TEST_GT.	{ O = TEST_OP_GT; }
 rel_binop(O) ::= TEST_GE.	{ O = TEST_OP_GE; }
 rel_binop(O) ::= TEST_LT.	{ O = TEST_OP_LT; }
--- a/epan/dfilter/scanner.l
+++ b/epan/dfilter/scanner.l
@ -135,16 +135,12 @@ static int simple(int token, const char *token_value);
 	return simple(TOKEN_RBRACE, "}");
 }

-"=="		return SIMPLE(TOKEN_TEST_EQ);
-"eq"		return SIMPLE(TOKEN_TEST_EQ);
-"!="			{
-	add_deprecated_token(yyextra->dfw, "!=");
-	return SIMPLE(TOKEN_TEST_NE);
-}
-"ne"			{
-	add_deprecated_token(yyextra->dfw, "ne");
-	return SIMPLE(TOKEN_TEST_NE);
-}
+"=="		return SIMPLE(TOKEN_TEST_ANY_EQ);
+"eq"		return SIMPLE(TOKEN_TEST_ANY_EQ);
+"!="		return SIMPLE(TOKEN_TEST_ALL_NE);
+"ne"		return SIMPLE(TOKEN_TEST_ALL_NE);
+"~="		return SIMPLE(TOKEN_TEST_ANY_NE);
+"any_ne"	return SIMPLE(TOKEN_TEST_ANY_NE);
 ">"		return SIMPLE(TOKEN_TEST_GT);
 "gt"		return SIMPLE(TOKEN_TEST_GT);
 ">="		return SIMPLE(TOKEN_TEST_GE);
@ -432,8 +428,9 @@ simple(int token, const char *token_value)
 		case TOKEN_DOTDOT:
 		case TOKEN_HYPHEN:
 		case TOKEN_WHITESPACE:
-		case TOKEN_TEST_EQ:
-		case TOKEN_TEST_NE:
+		case TOKEN_TEST_ANY_EQ:
+		case TOKEN_TEST_ALL_NE:
+		case TOKEN_TEST_ANY_NE:
 		case TOKEN_TEST_GT:
 		case TOKEN_TEST_GE:
 		case TOKEN_TEST_LT:
--- a/epan/dfilter/semcheck.c
+++ b/epan/dfilter/semcheck.c
@ -1311,12 +1311,15 @@ check_test(dfwork_t *dfw, stnode_t *st_node)
 			semcheck(dfw, st_arg2);
 			break;

-		case TEST_OP_EQ:
+		case TEST_OP_ANY_EQ:
 			check_relation(dfw, "==", FALSE, ftype_can_eq, st_node, st_arg1, st_arg2);
 			break;
-		case TEST_OP_NE:
+		case TEST_OP_ALL_NE:
 			check_relation(dfw, "!=", FALSE, ftype_can_ne, st_node, st_arg1, st_arg2);
 			break;
+		case TEST_OP_ANY_NE:
+			check_relation(dfw, "~=", FALSE, ftype_can_ne, st_node, st_arg1, st_arg2);
+			break;
 		case TEST_OP_GT:
 			check_relation(dfw, ">", FALSE, ftype_can_gt, st_node, st_arg1, st_arg2);
 			break;
--- a/epan/dfilter/sttype-test.c
+++ b/epan/dfilter/sttype-test.c
@ -85,11 +85,14 @@ test_tostr(const void *value, gboolean pretty _U_)
 		case TEST_OP_OR:
 			s = "TEST_OR";
 			break;
-		case TEST_OP_EQ:
-			s = "TEST_EQ";
+		case TEST_OP_ANY_EQ:
+			s = "TEST_ANY_EQ";
 			break;
-		case TEST_OP_NE:
-			s = "TEST_NE";
+		case TEST_OP_ALL_NE:
+			s = "TEST_ALL_NE";
+			break;
+		case TEST_OP_ANY_NE:
+			s = "TEST_ANY_NE";
 			break;
 		case TEST_OP_GT:
 			s = "TEST_GT";
@ -135,8 +138,9 @@ num_operands(test_op_t op)
 			return 1;
 		case TEST_OP_AND:
 		case TEST_OP_OR:
-		case TEST_OP_EQ:
-		case TEST_OP_NE:
+		case TEST_OP_ANY_EQ:
+		case TEST_OP_ALL_NE:
+		case TEST_OP_ANY_NE:
 		case TEST_OP_GT:
 		case TEST_OP_GE:
 		case TEST_OP_LT:
--- a/epan/dfilter/sttype-test.h
+++ b/epan/dfilter/sttype-test.h
@ -16,8 +16,9 @@ typedef enum {
 	TEST_OP_NOT,
 	TEST_OP_AND,
 	TEST_OP_OR,
-	TEST_OP_EQ,
-	TEST_OP_NE,
+	TEST_OP_ANY_EQ,
+	TEST_OP_ALL_NE,
+	TEST_OP_ANY_NE,
 	TEST_OP_GT,
 	TEST_OP_GE,
 	TEST_OP_LT,
--- a/epan/proto.c
+++ b/epan/proto.c
@ -448,6 +448,7 @@ static const char *reserved_filter_names[] = {
 	/* Display filter keywords. */
 	"eq",
 	"ne",
+	"any_ne",
 	"gt",
 	"ge",
 	"lt",
--- a/test/suite_dfilter/group_deprecated.py
+++ b/test/suite_dfilter/group_deprecated.py
@ -15,13 +15,5 @@ class case_deprecated(unittest.TestCase):
        checkDFilterSucceed(dfilter, "suggest parentheses around")

    def test_deprecated_2(self, checkDFilterSucceed):
-        dfilter = "ip.proto ne 17"
-        checkDFilterSucceed(dfilter, "Deprecated tokens: \"ne\"")
-
-    def test_deprecated_3(self, checkDFilterSucceed):
-        dfilter = "ip.proto != 17"
-        checkDFilterSucceed(dfilter, "Deprecated tokens: \"!=\"")
-
-    def test_deprecated_4(self, checkDFilterSucceed):
        dfilter = "bootp"
        checkDFilterSucceed(dfilter, "Deprecated tokens: \"bootp\"")
--- a/test/suite_dfilter/group_syntax.py
+++ b/test/suite_dfilter/group_syntax.py
@ -46,3 +46,35 @@ class case_syntax(unittest.TestCase):
    def test_matches_4(self, checkDFilterCount):
        dfilter = r'http.host matches r"update\.microsoft\.c.."'
        checkDFilterCount(dfilter, 1)
+
+    def test_equal_1(self, checkDFilterCount):
+        dfilter = 'ip.addr == 10.0.0.5'
+        checkDFilterCount(dfilter, 1)
+
+    def test_equal_2(self, checkDFilterCount):
+        dfilter = 'ip.addr == 207.46.134.94'
+        checkDFilterCount(dfilter, 1)
+
+    def test_equal_3(self, checkDFilterCount):
+        dfilter = 'ip.addr == 10.0.0.5 or ip.addr == 207.46.134.94'
+        checkDFilterCount(dfilter, 1)
+
+    def test_equal_4(self, checkDFilterCount):
+        dfilter = 'ip.addr == 10.0.0.5 and ip.addr == 207.46.134.94'
+        checkDFilterCount(dfilter, 1)
+
+    def test_not_equal_1(self, checkDFilterCount):
+        dfilter = 'ip.addr != 10.0.0.5'
+        checkDFilterCount(dfilter, 0)
+
+    def test_not_equal_2(self, checkDFilterCount):
+        dfilter = 'ip.addr != 207.46.134.94'
+        checkDFilterCount(dfilter, 0)
+
+    def test_not_equal_3(self, checkDFilterCount):
+        dfilter = 'ip.addr != 10.0.0.5 and ip.addr != 207.46.134.94'
+        checkDFilterCount(dfilter, 0)
+
+    def test_not_equal_4(self, checkDFilterCount):
+        dfilter = 'ip.addr != 10.0.0.5 or ip.addr != 207.46.134.94'
+        checkDFilterCount(dfilter, 0)
--- a/ui/qt/widgets/syntax_line_edit.cpp
+++ b/ui/qt/widgets/syntax_line_edit.cpp
@ -196,10 +196,10 @@ bool SyntaxLineEdit::checkDisplayFilter(QString filter)
            header_field_info *hfi = proto_registrar_get_byalias(token_str);
            if (hfi)
                syntax_error_message_ = tr("\"%1\" is deprecated in favour of \"%2\". "
-                                           "See Help section 6.4.9 for details.").arg(token_str).arg(hfi->abbrev);
+                                           "See Help section 6.4.8 for details.").arg(token_str).arg(hfi->abbrev);
            else
-                syntax_error_message_ = tr("\"%1\" may have unexpected results. "
-                                           "See Help section 6.4.8 for details.").arg(token_str);
+                // The token_str is the message.
+                syntax_error_message_ = tr("%1").arg(token_str);
            g_free(token_str);
        } else {
            setSyntaxState(SyntaxLineEdit::Valid);