Perl Regular Expression Matching is NP-Hard

Matching ordinary regular expresions can be done in polynomial time, proportional to MN, where M is the length of the regular expression and N is the length of the string to be matched. The usual method for this is: Parse the regular expression and construct an equivalent finite automaton (FA), which will have O(M) states; then simulate the action of the FA on the input, which takes O(MN) time. My Regex.pm module demonstrates this.

However, adding backreferences to regular expressions complicates them so that the usual FA construction does not work. Perl can handle these extended regexes, but it sometimes takes a long time to find out if there is a match or not. For example, Perl takes exponential time (in the length of the string to be matched) to check whether or not a string matches the regex /^(?:(\d+)|::)*$/.

One is tempted to wonder whether this difficulty is inherent, or whether there might be a clever way of speeding up the matching algorithm. The answer is that there is probably no clever way to speed up the algorithm, and that the exponential running time of the matching algorithm is probably unavoidable. We show below that regex matching is NP-hard when regexes are allowed to have backreferences.

The usual method of showing that a problem P is NP-hard is by showing that some other NP-hard problem, Q, could be solved efficiently if P could be also. I showed that instances of the GRAPH 3-COLORABILITY problem can be reduced to instances of regex matching. Abigail showed that instances of the 3CNF-SAT problem can be reduced to instances of regex matching. Greg Bacon showed that instances of the CLIQUE problem can be reduced to instances of regex matching.


Some Notes About NP-Completeness

This page used to say "Regex Matching is NP-Complete". That may not be correct. I've been meaning to fix it for years, but I never got around to it before. To show that regex matching is NP-complete, we need to show two things. First, we need to show that it is NP-hard; the proofs on this page do that. Second, we need to show that regex matching is in NP. The proofs don't do that.


Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia

mjd-perl-npc@plover.com