Perl Regular Expression Mastery

Length: 3 hours

Prerequisites: Students should have had at least two months of experience using regexes in Perl,
or should be regular users of regular expressions in some similar language such as Python,
or should be familiar with the use of regexes in Unix utilities such as vi and egrep.


Almost everyone has written a regex that failed to match something they wanted it to, or that matched something they thought it shouldn't, and often it can be hard to predict what a regex will do. This class will fix that.

The first section will explore the algorithm that perl uses internally to do regex matching. Understanding this algorithm will allow us to predict whether a regex will match, which of several matches Perl will find, and which regexes will be faster than others. During this discussion we'll pause to discuss practical applications that illustrate features of the algorithm. We'll examine the essential but frequently misunderstood concept of 'greed', and we'll learn why commonly-used regex symbols like ., $, and \1 might not mean what you thought they did.

In the second section, we'll apply our knowledge of the internals, examining at several common disasters, a few practical parsing applications, and some new features such that would have been hard to understand before. We'll see an example of every regex metacharacter and modifier. We'll finish with a discussion of some of the new optimizations that were added in Perl 5.6, and why you should avoid the /i modifier.



A variant of this class is available the reduces the emphasis on Perl and replaces it with a comparison of the regex features with those in other languges and utilities. In particular, Perl matching semantics are contrasted with POSIX standard semantics.

Complete Slides

Thanks to the persuasion of the kind folks at my OSCON BOF session, the complete slides for this tutorial are now available online.

View the slides online

Download .tgz file of the entire talk

Download PDF file of the entire talk or in 2-up or 4-up format

Return to: Universe of Discourse main page | Perl Paraphernalia | Other Classes and Talks