How Regexes Work

Sidebar: Terminology

This is all serious computer science in this article, but I didn't want you to get upset by the terminology. It's easy to understand what's going on, but it's also easy to get lost in a lot of technical jargon, so I took out the jargon. Here it is if you want to impress your friends with it:

The machines are called `Finite Automata'. `Automata' is Greek; it means `things that go by themselves'. The automata are finite because there are only a finite number of circles and arrows; this is to contrast them with automata that have an infinite amount of memory. `Finite automaton' is a mouthful even for a computer scientist with a Ph.D., so they usually end up abbreviating it to `FA'.

The kind of FA's we've been looking at are called `nondeterministic', or `NFA's for short. This refers to the fact that pennies sometimes get cloned. If there's only one arrow with a particular label leading out of each circle, the pennies never get cloned, and the automaton is called `deterministic' or a `DFA'.

When the FA says `yes', computer scientists say it has `accepted' its input; when it says `no', they say it has `rejected' its input.

The arrows are called `transitions'. Blank arrows are called `epsilon-transitions' . The circles are called `states'. The start circle is called the `start state'. The double circles are called `accepting states' or sometimes `final states'. Computer scientists don't talk about the pennies, because they are too important to waste time playing with pennies. They just talk about the `current states', which are the circles that the pennies would have been sitting on if there were any pennies.

Computer scientists draw the pictures exactly the same way I drew them for you, except that they don't actually label the start state with `Start Here'. They do draw that little arrow pointing to it, however. And they sometimes also label the blank arrows with the Greek letter epsilon so that they can remember that they're epsilon-transitions.

Now you can talk just like a computer scientist.


s

Return to the article.


Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia | How Regexes Work

mjd-tpj-regex@plover.com