dfilter: Fix "!=" relation to be free of contradictions

Wireshark defines the relation of equality A == B as
A any_eq B <=> An == Bn for at least one An, Bn.
More accurately I think this is (formally) an equivalence
relation, not true equality.

Whichever definition for "==" we choose we must keep the
definition of "!=" as !(A == B), otherwise it will
lead to logical contradictions like (A == B) AND (A != B)
being true.

Fix the '!=' relation to match the definition of equality:
  A != B <=> !(A == B) <=> A all_ne B <=> An != Bn, for
every n.

This has been the recomended way to write "not equal" for a
long time in the documentation, even to the point where != was
deprecated, but it just wasn't implemented consistently in the
language, which has understandably been a persistent source
of confusion. Even a field that is normally well-behaved
with "!=" like "ip.src" or "ip.dst" will produce unexpected
results with encapsulations like IP-over-IP.

The opcode ALL_NE could have been implemented in the compiler
instead using NOT and ANY_EQ but I chose to implement it in
bytecode. It just seemed more elegant and efficient
but the difference was not very significant.

Keep around "~=" for any_ne relation, in case someone depends
on that, and because we don't have an operator for true equality:
  A strict_equal B <=> A all_eq B <=> !(A any_ne B).
If there is only one value then any_ne and all_ne are the same
comparison operation.

Implementing this change did not require fixing any tests so it
is unlikely the relation "~=" (any_ne) will be very useful.

Note that the behaviour of the '<' (less than) comparison relation
is a separate, more subtle issue. In the general case the definition
of '<' that is used is only a partial order.
This commit is contained in:
João Valverde 2021-10-18 21:07:06 +01:00 committed by Wireshark GitLab Utility
parent d2b249a445
commit 0abe10e040
15 changed files with 143 additions and 74 deletions

View File

@ -110,6 +110,10 @@ The following features are new (or have been significantly updated) since versio
For example "http.user_agent contains aa:bb" tries to match "aa:bb". Avoid this usage, always use double-quotes: http.user_agent contains "\xaa\xbb".
** Regular expressions (using "matches" or "~") must be specified using character strings. It is a syntax error to omit the double-quotes around
the regular expression. Before the syntax rules of an unquoted regex string could be difficult to predict.
** The expression "a != b" now always has the same meaning as !(a == b). In particular this means filter expressions with multi-value fields like
"ip.addr != 1.1.1.1" will work as expected (the result is the same as typing "ip.src != 1.1.1.1 and ip.dst != 1.1.1.1"). This avoids the
contradiction (a == b and a != b) being true.
** Use the syntax "a ~= b" or "a any_ne b" to recover the previous (inconsistent with ==) logic for not equal.
* Corrected calculation of mean jitter in RTP Stream Analysis dialog and IAX2 Stram Analysis dialog

View File

@ -1155,10 +1155,10 @@ You can become more familiar with display filter fields by selecting different p
[[ChUseWiresharkStatusbarFilter]]
//FIXME: Remove or choose a better example of a display filter message.
.The Statusbar with a display filter message
image::wsug_graphics/ws-statusbar-filter.png[{statusbar-attrs}]
This is displayed if you are trying to use a display filter which may have unexpected results.
For a detailed description see <<ChWorkBuildDisplayFilterMistake>>.
// End of WSUG Chapter 3

View File

@ -804,34 +804,6 @@ To match IP addresses ending in 255 in a block of subnets (172.16 to 172.31):
string(ip.dst) matches r"^172\.(1[6-9]|2[0-9]|3[0-1])\.[0-9]{1,3}\.255"
----
[[ChWorkBuildDisplayFilterMistake]]
==== A Common Mistake with !=
Using the != operator on combined expressions like `eth.addr`, `ip.addr`,
`tcp.port`, and `udp.port` will probably not work as expected. Wireshark
will show the warning “"!=" may have unexpected results” when you use it.
People often use a filter string like `ip.addr == 1.2.3.4`
to display all packets containing the IP address 1.2.3.4.
Then they use `ip.addr != 1.2.3.4` expecting to see all packets not containing the IP
address 1.2.3.4 in it. Unfortunately, this does _not_ do the expected.
Instead, that expression will even be true for packets where either the source or
destination IP address equals 1.2.3.4. The reason for this is because the
expression `ip.addr != 1.2.3.4` is read as “the packet contains a field
named ip.addr with a value different from 1.2.3.4”. As an IP datagram contains
both a source and a destination address, the expression will evaluate to true
whenever at least one of the two addresses differs from 1.2.3.4.
If you want to filter out all packets containing IP datagrams to or from IP
address 1.2.3.4, then the correct filter is `!(ip.addr == 1.2.3.4)` as it is read
“show me all the packets for which it is not true that a field named ip.addr
exists with a value of 1.2.3.4”, or in other words, “filter out all packets
for which there are no occurrences of a field named ip.addr with the value
1.2.3.4”.
[[ChWorkBuildDisplayFilterTransitional]]
==== Sometimes Fields Change Names

View File

@ -120,6 +120,7 @@ dfvm_dump(FILE *f, dfilter_t *df)
case CALL_FUNCTION:
case MK_RANGE:
case ANY_EQ:
case ALL_NE:
case ANY_NE:
case ANY_GT:
case ANY_GE:
@ -228,6 +229,11 @@ dfvm_dump(FILE *f, dfilter_t *df)
id, arg1->value.numeric, arg2->value.numeric);
break;
case ALL_NE:
fprintf(f, "%05d ALL_NE\t\treg#%u == reg#%u\n",
id, arg1->value.numeric, arg2->value.numeric);
break;
case ANY_NE:
fprintf(f, "%05d ANY_NE\t\treg#%u == reg#%u\n",
id, arg1->value.numeric, arg2->value.numeric);
@ -372,26 +378,74 @@ put_pcre(dfilter_t *df, fvalue_regex_t *pcre, int reg)
return TRUE;
}
typedef gboolean (*FvalueCmpFunc)(const fvalue_t*, const fvalue_t*);
enum match_how {
MATCH_ANY,
MATCH_ALL
};
typedef gboolean (*DFVMMatchFunc)(const fvalue_t*, const fvalue_t*);
static gboolean
any_test(dfilter_t *df, FvalueCmpFunc cmp, int reg1, int reg2)
cmp_test(dfilter_t *df, enum match_how how, DFVMMatchFunc match_func, int reg1, int reg2)
{
GList *list_a, *list_b;
gboolean want_all = (how == MATCH_ALL);
gboolean want_any = (how == MATCH_ANY);
gboolean have_match;
list_a = df->registers[reg1];
while (list_a) {
list_b = df->registers[reg2];
while (list_b) {
if (cmp((fvalue_t *)list_a->data, (fvalue_t *)list_b->data)) {
have_match = match_func(list_a->data, list_b->data);
if (want_all && !have_match) {
return FALSE;
}
else if (want_any && have_match) {
return TRUE;
}
list_b = g_list_next(list_b);
}
list_a = g_list_next(list_a);
}
return FALSE;
/* want_all || !want_any */
return want_all;
}
static inline gboolean
any_test(dfilter_t *df, DFVMMatchFunc cmp, int reg1, int reg2)
{
/* cmp(A) <=> cmp(a1) OR cmp(a2) OR cmp(a3) OR ... */
return cmp_test(df, MATCH_ANY, cmp, reg1, reg2);
}
static inline gboolean
all_test(dfilter_t *df, DFVMMatchFunc cmp, int reg1, int reg2)
{
/* cmp(A) <=> cmp(a1) AND cmp(a2) AND cmp(a3) AND ... */
return cmp_test(df, MATCH_ALL, cmp, reg1, reg2);
}
static inline gboolean
any_eq(dfilter_t *df, int reg1, int reg2)
{
/* A any_eq B <=> a1 == b1 OR a2 == b2 OR a3 == b3 OR ... */
return any_test(df, fvalue_eq, reg1, reg2);
}
static inline gboolean
any_ne(dfilter_t *df, int reg1, int reg2)
{
/* A any_ne B <=> a1 != b1 OR a2 != b2 OR a3 != b3 OR ... */
return any_test(df, fvalue_ne, reg1, reg2);
}
static inline gboolean
all_ne(dfilter_t *df, int reg1, int reg2)
{
/* A all_ne B <=> a1 != b1 AND a2 != b2 AND a3 != b3 AND ... */
return all_test(df, fvalue_ne, reg1, reg2);
}
static gboolean
@ -573,13 +627,15 @@ dfvm_apply(dfilter_t *df, proto_tree *tree)
break;
case ANY_EQ:
accum = any_test(df, fvalue_eq,
arg1->value.numeric, arg2->value.numeric);
accum = any_eq(df, arg1->value.numeric, arg2->value.numeric);
break;
case ALL_NE:
accum = all_ne(df, arg1->value.numeric, arg2->value.numeric);
break;
case ANY_NE:
accum = any_test(df, fvalue_ne,
arg1->value.numeric, arg2->value.numeric);
accum = any_ne(df, arg1->value.numeric, arg2->value.numeric);
break;
case ANY_GT:
@ -702,6 +758,7 @@ dfvm_init_const(dfilter_t *df)
case CALL_FUNCTION:
case MK_RANGE:
case ANY_EQ:
case ALL_NE:
case ANY_NE:
case ANY_GT:
case ANY_GE:

View File

@ -53,6 +53,7 @@ typedef enum {
PUT_FVALUE,
PUT_PCRE,
ANY_EQ,
ALL_NE,
ANY_NE,
ANY_GT,
ANY_GE,

View File

@ -507,11 +507,15 @@ gen_test(dfwork_t *dfw, stnode_t *st_node)
val1->value.numeric = dfw->next_insn_id;
break;
case TEST_OP_EQ:
case TEST_OP_ANY_EQ:
gen_relation(dfw, ANY_EQ, st_arg1, st_arg2);
break;
case TEST_OP_NE:
case TEST_OP_ALL_NE:
gen_relation(dfw, ALL_NE, st_arg1, st_arg2);
break;
case TEST_OP_ANY_NE:
gen_relation(dfw, ANY_NE, st_arg1, st_arg2);
break;

View File

@ -264,8 +264,9 @@ range_node(D) ::= INTEGER(X).
stnode_free(X);
}
rel_binop(O) ::= TEST_EQ. { O = TEST_OP_EQ; }
rel_binop(O) ::= TEST_NE. { O = TEST_OP_NE; }
rel_binop(O) ::= TEST_ANY_EQ. { O = TEST_OP_ANY_EQ; }
rel_binop(O) ::= TEST_ALL_NE. { O = TEST_OP_ALL_NE; }
rel_binop(O) ::= TEST_ANY_NE. { O = TEST_OP_ANY_NE; }
rel_binop(O) ::= TEST_GT. { O = TEST_OP_GT; }
rel_binop(O) ::= TEST_GE. { O = TEST_OP_GE; }
rel_binop(O) ::= TEST_LT. { O = TEST_OP_LT; }

View File

@ -135,16 +135,12 @@ static int simple(int token, const char *token_value);
return simple(TOKEN_RBRACE, "}");
}
"==" return SIMPLE(TOKEN_TEST_EQ);
"eq" return SIMPLE(TOKEN_TEST_EQ);
"!=" {
add_deprecated_token(yyextra->dfw, "!=");
return SIMPLE(TOKEN_TEST_NE);
}
"ne" {
add_deprecated_token(yyextra->dfw, "ne");
return SIMPLE(TOKEN_TEST_NE);
}
"==" return SIMPLE(TOKEN_TEST_ANY_EQ);
"eq" return SIMPLE(TOKEN_TEST_ANY_EQ);
"!=" return SIMPLE(TOKEN_TEST_ALL_NE);
"ne" return SIMPLE(TOKEN_TEST_ALL_NE);
"~=" return SIMPLE(TOKEN_TEST_ANY_NE);
"any_ne" return SIMPLE(TOKEN_TEST_ANY_NE);
">" return SIMPLE(TOKEN_TEST_GT);
"gt" return SIMPLE(TOKEN_TEST_GT);
">=" return SIMPLE(TOKEN_TEST_GE);
@ -432,8 +428,9 @@ simple(int token, const char *token_value)
case TOKEN_DOTDOT:
case TOKEN_HYPHEN:
case TOKEN_WHITESPACE:
case TOKEN_TEST_EQ:
case TOKEN_TEST_NE:
case TOKEN_TEST_ANY_EQ:
case TOKEN_TEST_ALL_NE:
case TOKEN_TEST_ANY_NE:
case TOKEN_TEST_GT:
case TOKEN_TEST_GE:
case TOKEN_TEST_LT:

View File

@ -1311,12 +1311,15 @@ check_test(dfwork_t *dfw, stnode_t *st_node)
semcheck(dfw, st_arg2);
break;
case TEST_OP_EQ:
case TEST_OP_ANY_EQ:
check_relation(dfw, "==", FALSE, ftype_can_eq, st_node, st_arg1, st_arg2);
break;
case TEST_OP_NE:
case TEST_OP_ALL_NE:
check_relation(dfw, "!=", FALSE, ftype_can_ne, st_node, st_arg1, st_arg2);
break;
case TEST_OP_ANY_NE:
check_relation(dfw, "~=", FALSE, ftype_can_ne, st_node, st_arg1, st_arg2);
break;
case TEST_OP_GT:
check_relation(dfw, ">", FALSE, ftype_can_gt, st_node, st_arg1, st_arg2);
break;

View File

@ -85,11 +85,14 @@ test_tostr(const void *value, gboolean pretty _U_)
case TEST_OP_OR:
s = "TEST_OR";
break;
case TEST_OP_EQ:
s = "TEST_EQ";
case TEST_OP_ANY_EQ:
s = "TEST_ANY_EQ";
break;
case TEST_OP_NE:
s = "TEST_NE";
case TEST_OP_ALL_NE:
s = "TEST_ALL_NE";
break;
case TEST_OP_ANY_NE:
s = "TEST_ANY_NE";
break;
case TEST_OP_GT:
s = "TEST_GT";
@ -135,8 +138,9 @@ num_operands(test_op_t op)
return 1;
case TEST_OP_AND:
case TEST_OP_OR:
case TEST_OP_EQ:
case TEST_OP_NE:
case TEST_OP_ANY_EQ:
case TEST_OP_ALL_NE:
case TEST_OP_ANY_NE:
case TEST_OP_GT:
case TEST_OP_GE:
case TEST_OP_LT:

View File

@ -16,8 +16,9 @@ typedef enum {
TEST_OP_NOT,
TEST_OP_AND,
TEST_OP_OR,
TEST_OP_EQ,
TEST_OP_NE,
TEST_OP_ANY_EQ,
TEST_OP_ALL_NE,
TEST_OP_ANY_NE,
TEST_OP_GT,
TEST_OP_GE,
TEST_OP_LT,

View File

@ -448,6 +448,7 @@ static const char *reserved_filter_names[] = {
/* Display filter keywords. */
"eq",
"ne",
"any_ne",
"gt",
"ge",
"lt",

View File

@ -15,13 +15,5 @@ class case_deprecated(unittest.TestCase):
checkDFilterSucceed(dfilter, "suggest parentheses around")
def test_deprecated_2(self, checkDFilterSucceed):
dfilter = "ip.proto ne 17"
checkDFilterSucceed(dfilter, "Deprecated tokens: \"ne\"")
def test_deprecated_3(self, checkDFilterSucceed):
dfilter = "ip.proto != 17"
checkDFilterSucceed(dfilter, "Deprecated tokens: \"!=\"")
def test_deprecated_4(self, checkDFilterSucceed):
dfilter = "bootp"
checkDFilterSucceed(dfilter, "Deprecated tokens: \"bootp\"")

View File

@ -46,3 +46,35 @@ class case_syntax(unittest.TestCase):
def test_matches_4(self, checkDFilterCount):
dfilter = r'http.host matches r"update\.microsoft\.c.."'
checkDFilterCount(dfilter, 1)
def test_equal_1(self, checkDFilterCount):
dfilter = 'ip.addr == 10.0.0.5'
checkDFilterCount(dfilter, 1)
def test_equal_2(self, checkDFilterCount):
dfilter = 'ip.addr == 207.46.134.94'
checkDFilterCount(dfilter, 1)
def test_equal_3(self, checkDFilterCount):
dfilter = 'ip.addr == 10.0.0.5 or ip.addr == 207.46.134.94'
checkDFilterCount(dfilter, 1)
def test_equal_4(self, checkDFilterCount):
dfilter = 'ip.addr == 10.0.0.5 and ip.addr == 207.46.134.94'
checkDFilterCount(dfilter, 1)
def test_not_equal_1(self, checkDFilterCount):
dfilter = 'ip.addr != 10.0.0.5'
checkDFilterCount(dfilter, 0)
def test_not_equal_2(self, checkDFilterCount):
dfilter = 'ip.addr != 207.46.134.94'
checkDFilterCount(dfilter, 0)
def test_not_equal_3(self, checkDFilterCount):
dfilter = 'ip.addr != 10.0.0.5 and ip.addr != 207.46.134.94'
checkDFilterCount(dfilter, 0)
def test_not_equal_4(self, checkDFilterCount):
dfilter = 'ip.addr != 10.0.0.5 or ip.addr != 207.46.134.94'
checkDFilterCount(dfilter, 0)

View File

@ -196,10 +196,10 @@ bool SyntaxLineEdit::checkDisplayFilter(QString filter)
header_field_info *hfi = proto_registrar_get_byalias(token_str);
if (hfi)
syntax_error_message_ = tr("\"%1\" is deprecated in favour of \"%2\". "
"See Help section 6.4.9 for details.").arg(token_str).arg(hfi->abbrev);
"See Help section 6.4.8 for details.").arg(token_str).arg(hfi->abbrev);
else
syntax_error_message_ = tr("\"%1\" may have unexpected results. "
"See Help section 6.4.8 for details.").arg(token_str);
// The token_str is the message.
syntax_error_message_ = tr("%1").arg(token_str);
g_free(token_str);
} else {
setSyntaxState(SyntaxLineEdit::Valid);