dfilter: Allow generic unquoted strings that are protocol names on RHS

Protocol names can only be on the left hand side of filter expressions. Commit ac0b1d42f3 (merge !4214) caused unquoted strings that could be interpreted as either protocol names or byte arrays to be parsed as the latter when on the RHS. Further relax by allowing unquoted strings on the RHS that can be interpreted as protocol names to be treated as any generic unquoted string. (The semantic checker will still prefer interpreting the string as a byte array, if possible, to a generic string.) This is useful for filter expressions of the sort "frame contains data", where data should be interpreted as "data", i.e. "\x64\x61\x74\x61". Long run this ideally should be fixed earlier, at the lex parser or grammar checker.
2021-09-19 11:23:26 -04:00 · 2021-09-19 11:23:26 -04:00 · a2120e823c
parent 1dad8190b1
commit a2120e823c
1 changed files with 16 additions and 21 deletions
--- a/epan/dfilter/semcheck.c
+++ b/epan/dfilter/semcheck.c
@ -1420,23 +1420,30 @@ check_relation(dfwork_t *dfw, const char *relation_string,
 #endif
 	header_field_info   *hfinfo;
 	stnode_t            *new_st;
-	fvalue_t            *fvalue;
 	char                *s;

 	DebugLog(("   4 check_relation(\"%s\") [%u]\n", relation_string, i++));

 	/* Protocol can only be on LHS (for "contains" or "matches" operators).
-	 * Check to see if protocol is on RHS, and try to parse it as a byte
-	 * array instead.  Only raise an error if it can't be re-interpreted.
+	 * Check to see if protocol is on RHS, and re-interpret it as UNPARSED
+	 * instead. The subsequent functions will parse it according to the
+	 * existing rules for unparsed unquoted strings.
+	 *
 	 * This catches the case where the user has written "fc" on the RHS,
 	 * probably intending a byte value rather than the fibre channel
-	 * protocol, or similar for a number of other possibilities.
-	 * ("dc", "ff", "fefd")
+	 * protocol, or similar for a number of other possibilities
+	 * ("dc", "ff", "fefd"), and also catches the case where the user
+	 * has written a generic string on the RHS for a "contains" or
+	 * "matches" relation. (XXX: There's still a bit of a confusing mess;
+	 * byte arrays take precedent over generic strings when unquoted, so
+	 * "field contains data" matches "\x64 \x61 \x74 \x61" but
+	 * "field contains dc" matches "\xdc" and not "\x64 \x43", but that's
+	 * an underlying issue.)
 	 *
 	 * XXX: Is there a better way to do this in the lex scanner or grammar
-	 * parser step instead?  Or should the deterimination of whether
-	 * something is a field occur later than it does currently?  This is
-	 * kind of a hack.
+	 * parser step instead?  Should the determination of whether something
+	 * is a field occur later than it does currently?  This is kind of a
+	 * hack.
 	 */

 	if (stnode_type_id(st_arg2) == STTYPE_FIELD) {
@ -1447,22 +1454,10 @@ check_relation(dfwork_t *dfw, const char *relation_string,
 			 * string.
 			 */
 			s = (char *)hfinfo->abbrev;
-			/* Could we parse it as a byte string instead of a
-			 * protocol name? (Set allow_partial_value to false
-			 * because there's no point in allowing, e.g. "bfd"
-			 * through.)
-			 */
-			fvalue = fvalue_from_unparsed(FT_BYTES, s, FALSE, NULL);
-			if (!fvalue) {
-				/* No. Raise an error. */
-				dfilter_fail(dfw, "Protocol (\"%s\") cannot appear on right-hand side of comparison.", hfinfo->abbrev);
-				THROW(TypeError);
-			}
-			/* Yes. Send it through as unparsed and all the other
+			/* Send it through as unparsed and all the other
 			 * functions will take care of it as if it didn't
 			 * match a protocol string.
 			 */
-			FVALUE_FREE(fvalue);
 			new_st = stnode_new(STTYPE_UNPARSED, s);
 			stnode_free(st_arg2);
 			st_arg2 = new_st;