DNSDB Farsight Compatible Regular Expressions (FCRE) provides regular expression (regexp) functionality for searching DNS hostnames and rdata values in DNSDB. The regexp searches are evaluated against the DNS master file form of the hostnames and rdata values, which by design contains only printable ASCII characters. All non-printable characters, including octets outside the ASCII range, are converted to “\DDD” escape sequences, where “DDD” is a three digit decimal number per RFC 1035 (https://tools.ietf.org/html/rfc1035). This is only applicable to RData (RHS) queries.
For this limited use case, DNSDB FCRE provides a simplified subset of the POSIX Extended Regular Expression syntax, with the most notable restrictions being:
Note that restriction (3) means that PCRE extensions such as ‘\w’ and ‘\d’ are not allowed in FCRE regexps.
A regular expression is a string of printable characters, with the following characters given special meaning:
\
– Escape the next character, which must be a special character. A regexp may not
end with an unescaped ‘\’, or contain an unescaped ‘\’ followed by a character
other than ‘\’ or the characters listed below, except inside of a character class.^
– Matches the beginning of the subject string.$
– Matches the end of the subject string.[
– Begin a character class.
– A special character class matching any character.(
– Begin a sub-pattern. Sub-patterns may occur within other sub-patterns.)
– End a sub-pattern.|
– Specify an alternative match. A pattern or subpattern matches if the pattern
before or after the ‘|’ matches.*
– Match the previous character, character class, or subpattern zero or more times.?
– Match the previous character, character class, or subpattern at most once.+
– Match the previous character, character class, or subpattern at least once.{
– If followed by a character other than a decimal digit, is treated as a literal
‘{‘ character. Such a ‘{‘ may be escaped with ‘\’ even though it is not technically
a special character in this context.
If followed by a decimal digit, begins a bounded match specification. “{n}” matches exactly n repetitions of the previous character, character class, or subpattern. “{n,m}” with m >=n matches at least n but at most m repetitions.
A character class is a set of characters enclosed between an opening ‘[’ and a closing ‘]’. Within the character class, the following characters are handled specially:
^
– If the first character after the opening ‘[’, denotes a negated character
class, i.e. a class which matches any character not listed in the remainder of the
class]
– If the first character after the opening ‘[’ or ‘[^’, encodes a literal ‘]’ as
a member of the class. A ‘]’ after the first character after the opening ‘[’ or
‘[^’ ends the character class.-
– If the first character after the opening ‘[’ or ‘[^’ or the last character
before the closing ‘]’, encodes a literal ‘-‘ as a member of the character
class.The sequences [.
and [=
are not allowed between the opening [
or [^
and the
closing ]
, to prevent confusion with unsupported POSIX collation sequences and
collation classes.
If the sequence [:
appears in a character class, it must be the beginning of one of
the following POSIX character classes:
[:alnum:]
– Alphanumeric characters 0-9, A-Z, and a-z[:alpha:]
– Alphabetic characters A-Z, a-z[:blank:]
– Blank characters (space and tab)
[:blank:]
is equivalent to a space
character.\009
and can be matched with
\\009
.[:cntrl:]
– Control characters
[:cntrl:]
will not
match any characters.\DDD
escape sequences sequences. To
match one of those, you will need to backslash-quote the backslash.
Match with \\[:digit:]{3}
in a regular expression.[:digit:]
– Decimal digits 0-9[:graph:]
– Any printable character other than space.
[:graph:]
is equivalent to [^ ]
(negated character class containing
only a space).[:lower:]
– Lower case alphabetic characters a-z
[:lower:]
is equivalent to
[:alpha:]
.[:print:]
– Any printable character
[:print:]
will match
any character.[:punct:]
– Punctuation characters (printable characters other than space and
[:alnum:]
)[:space:]
– Any whitespace character (tab, newline, vertical tab, form feed,
carriage return, and space)
\009
and can be matched with
\\009
. The other characters can also be matched by searching for their decimal
equivalent.[:upper:]
– Upper case alphabetic characters A-Z
[:lower:]
.[:xdigit:]
– Hexadecimal digits 0-9, a-f, A-FThe above named character classes must appear inside an enclosing [
and ]
, e.g. [[:digit:][:punct:]]
to match a digit or punctuation
character. Without the enclosing braces, [:digit:]
will match the
characters :
, d
, i
, g
, or t
.
Neither the above character classes nor a character range may begin or end a character
range. For example, the character class expressions [0-[:alpha:]]
and [a-n-z]
are
invalid.
All other characters between the opening [
or [^
and the closing ]
are added
to the character class, including the backslash \
character.
There is no way to express a character class containing a single ^
character: an escaped
\^
should be used instead of a character class.
.
(such as between labels in a DNS
name), you need to backslash-quote the .
, for example
google\.com
. This is not necessary if the .
is inside a
character class, for example foo[.-_]bar
. If you don’t
backslash-quote the .
, for example google.com
then it will match
‘googlexcom’, ‘google_com’, etc..
, which
must be accounted for in regular expressions..
or a "
, which should be accounted for in regular
expressions.Some example regular expressions and some of the matching values
www\..*\.com
– Hostnames with a label ending in “www.” and a later label
starting with “.com”.
^www\..*\.com
– Hostnames starting with “www.” and ending in “.com”.
^www\..*\.com\.$
– Hostnames starting with “www.” and ending in “.com.”
^www\.[^.]+\.com\.$
– Hostnames starting with “www.” and ending with “.com”
with no other dots in between.
^((dev|stage)-)?www\.[^.]+\.(net|edu)\.$
– Hostnames starting with “www”
optionally preceded by a “dev-“ or “stage-“ prefix in a .net or .edu domain.
^"v=spf1 .* ~all"$
– TXT records encoding an SPF policy with a ~all default
(^|[-._])star([-_]?)z[-._]
– Hostnames that start with “star”, or
have “star” as a label or otherwise separate from other
letters/digits, followed by an optional dash or underscore, then a z, then a
period, dash or underscore. This might be used to look for a visibly embedded trademark.
Farsight Security, Inc. is the world’s largest provider of historical and real-time DNS intelligence solutions. We enable security teams to qualify, enrich and correlate all sources of threat data and ultimately save time when it is most critical - during an attack or investigation. Our solutions provide enterprise, government and security industry personnel and platforms with unmatched global visibility, context and response. Farsight Security is headquartered in San Mateo, California, USA. Learn more about how we can empower your threat platform and security team with Farsight Security passive DNS solutions at www.farsightsecurity.com or follow us on Twitter: @FarsightSecInc.