Matching ordinary regular expresions can be done in polynomial
time, proportional to `MN`, where `M` is the length
of the regular expression and `N` is the length of the string
to be matched. The usual method for this is: Parse the regular
expression and construct an equivalent finite automaton (FA), which
will have `O`(`M`) states; then simulate the action
of the FA on the input, which takes `O`(`MN`)
time. My `Regex.pm` module
demonstrates this.

However, adding backreferences to regular expressions complicates
them so that the usual FA construction does not work. Perl can handle
these extended regexes, but it sometimes takes a long time to find out
if there is a match or not. For example, Perl takes exponential time
(in the length of the string to be matched) to check whether or not a
string matches the regex `/^(?:(\d+)|::)*$/`.

One is tempted to wonder whether this difficulty is inherent, or whether there might be a clever way of speeding up the matching algorithm. The answer is that there is probably no clever way to speed up the algorithm, and that the exponential running time of the matching algorithm is probably unavoidable. We show below that regex matching is NP-hard when regexes are allowed to have backreferences.

The usual method of showing that a problem `P` is
NP-hard is by showing that some other NP-hard problem,
`Q`, could be solved efficiently if P could be also. I
showed that instances of the GRAPH 3-COLORABILITY problem can be
reduced to instances of regex matching. Abigail showed that instances
of the 3CNF-SAT problem can be reduced to instances of regex
matching. Greg Bacon showed that instances of the CLIQUE problem can
be reduced to instances of regex matching.

- I thought I had a proof by reduction from the SUBSET-SUM problem, but I realized it had an error, so it's unavailable (except to Papadimitriou's cs270 students) until I can fix it, or maybe forever.
- Reduction from the 3CNF-SAT problem. (Due to Abigail.)
- Reduction from graph 3-colorability problem.

This page used to say "Regex Matching is NP-Complete". That may not be correct. I've been meaning to fix it for years, but I never got around to it before. To show that regex matching is NP-complete, we need to show two things. First, we need to show that it is NP-hard; the proofs on this page do that. Second, we need to show that regex matching is in NP. The proofs don't do that.

Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia