Sample solutions and discussion
Perl Quiz of The Week #23 (20040901)

=head1 NAME

qotw-r23.pod - Summary of the Perl Quiz of the Week, #23 (regular)

=head1 SYNOPSIS

The following programmers sent solutions to "regular" Perl Quiz #23 to
the perl-qotw-discuss mailing list:

  Rod Adams
  Kester Allen
  Greg Bacon (5 solutions)
  Roger Burton West
  Leo Cacciari (2 solutions)
  Tom Coleman
  Jereme Corrado
  Andrew Dalke (Python)
  José Alves de Castro
  Mark Jason Dominus
  Christian Dühl (2 solutions)
  Darren Dunham (2 solutions)
  Kevin Earls
  Shlomi Fish
  Bruce J. Keeler (3 solutions, 1 in C)
  Ronald J. Kimball
  Zed Lopez (2 solutions)
  Muir Manders (2 solutions)
  Daniel Martin (3 solutions)
  James Mastros
  Xavier Noria (2 solutions)
  Yitzchak Scott-Thoennes (4 solutions)
  Ariel Shaqed (2 solutions)
  John J. Trammell
  Bill Tucker (2 solutions)
  Matthew Walton (2 solutions, 1 in Haskell)
  Zsban Ambrus (Ruby)

The bulk of the solutions submitted were in Perl.  There was one
submission in C, one in Python, one in Ruby, and one in Haskell.

All tests of Perl code were run on a 2.4 GHz Intel Celeron running Debian
(sarge) GNU/Linux .  The Perl version used was 5.8.4.  All code used in
creating this summary is available for download; see

        http://perl.plover.com/qotw/misc/r023/

=head1 THE QUIZ

The quiz text reads:

  Write a program, 'parens', which gets a command line argument, 'n', which
  is an integer.  The program should print all the properly-balanced strings
  of parentheses of length 2n.  For example, given the argument '3', the
  program should print these five lines:

    ((()))
    (()())
    (())()
    ()(())
    ()()()

  in some order.  (The order is not important.)  

  For the argument '1'; the output should be:
        
    ()

  and for argument '4', the output should be:

    (((())))
    ((()()))
    ((())())
    ((()))()
    (()(()))
    (()()())
    (()())()
    (())(())
    (())()()
    ()((()))
    ()(()())
    ()(())()
    ()()(())
    ()()()()

  in some order.

=head1 RESULTS

=head2 Correctness

Code output was judged on four criteria:

  The solution output lists only balanced strings ("B").
  The solution output lists the correct number of strings ("N").
  The solution output lists the strings in lexical order ("L").
  The solution output lists only unique strings ("U").

Criterion "L" is optional, but included as a talking point. Programs
that did not compile, or didn't approximate the invocation and the
output described in the problem text were skipped.

The results for the Perl code are:

  adams.pl                skip
  allen.pl                B N   U
  bacon1.pl               B   L U
  bacon2.pl               B N L U
  bacon3.pl               B N L U
  bacon4.pl               B N   U
  bacon5.pl               B N   U
  burton_west.pl          B N L U
  cacciari1.pl            B N L U
  cacciari2.pl            B N L U
  coleman.pl              B   L U
  corrado.pl              B N L U
  dalke.py                B N L U
  decastro.pl             B   L U
  demerphq.pl             skip
  dominus.pl              B N   U
  duehl1.pl               B N L U
  duehl2.pl               B N L U
  dunham1.pl              B N   U
  dunham2.pl              B N L U
  earls.pl                    L U
  fish.pl                 B N L U
  keeler1.pl              B N   U
  keeler2.pl              B N L U
  kimball.pl              B N L U
  lopez1.pl               B N L U
  lopez2.pl               B N L U
  manders1.pl             B N   U
  manders2.pl             B N   U
  martin1.pl              B N   U
  martin2.pl              B N   U
  martin3.pl              B N L U
  mastros.pl              B N L U
  noria1.pl               B N   U
  noria2.pl               B N L U
  scott_thoennes1.pl      B N   U
  scott_thoennes2.pl      B N   U
  scott_thoennes3.pl      B N   U
  scott_thoennes4.pl      B N   U
  shaqed1.pl              B N   U
  shaqed2.pl              B N   U
  trammell.pl             B N L U
  tucker1.pl              B N L U
  tucker2.pl              B N L U
  walton2.pl              B N   U

The results for the non-Perl code are:

  dalke.py                skip
  keeler3.c               B N L U
  walton1.lhs             skip
  zsban.rb                B N L U

=head1 TIMING

The following timing results represent the time in seconds for each
script to generate the complete output for N=0 through N=12 three
times.  Programs that generated incorrect output are omitted.

  shaqed1.pl             4.486572
  martin3.pl             4.496771
  shaqed2.pl             4.584533
  mastros.pl             6.25906
  dominus.pl             6.568825
  scott_thoennes2.pl     6.592227
  keeler2.pl             9.353995
  scott_thoennes3.pl     9.723624
  martin1.pl            10.399679
  manders2.pl           10.489536
  fish.pl               10.569164
  manders1.pl           10.926351
  walton2.pl            14.059723
  lopez3.pl             15.483148
  tucker1.pl            16.3078
  lopez1.pl             17.085956
  noria2.pl             18.488001
  dalke.py              19.863028
  cacciari1.pl          20.803137
  tucker2.pl            23.346204
  cacciari2.pl          23.992838
  burton_west.pl        27.436623
  scott_thoennes1.pl    28.85522
  trammell.pl           35.020822
  bacon5.pl             46.568366
  lopez2.pl             47.324535
  keeler1.pl            53.49667
  noria1.pl             58.954158
  dunham2.pl            60.137515
  kimball.pl            62.261176
  allen.pl              82.689522
  bacon4.pl             85.976964
  scott_thoennes4.pl    96.830828
  duehl2.pl            115.1553
  corrado.pl           133.964715
  duehl1.pl            147.731797
  dunham1.pl           259.267496
  martin2.pl          1263.874676
  bacon2.pl           1616.085711

=head1 ANALYSIS

The many solutions included

 * "brute force" algorithms that produced all combinations of
   ) and (, then filtered out invalid strings
 * more subtle "brute force" algorithms based on combinations
   of known valid strings of shorter length
 * recursive solutions that built valid strings from left to right,
   or built long valid strings by combining shorter ones
 * more interesting, subtle, or complicated solutions, that I'll
   need more time to digest

A straightforward recursive solution was posted by Shlomi Fish
(fish.pl):

   1  #!/usr/bin/perl -w
   2  
   3  use strict;
   4  
   5  my $N = shift;
   6  
   7  sub recurse
   8  {
   9      my ($string, $num_opened, $num_closed) = (@_);
  10      if (($num_opened == $N) && ($num_closed == $N))
  11      {
  12          print "$string\n";
  13          return;
  14      }
  15      elsif ($num_opened < $N)
  16      {
  17          recurse("$string(", $num_opened+1, $num_closed);
  18      }
  19      if ($num_opened > $num_closed)
  20      {
  21          recurse("$string)", $num_opened, $num_closed+1);
  22      }
  23  }
  24  
  25  recurse("", 0, 0);
  26  

Lines 1-5 set the stage, turning on strictures and warnings, and
establishing $N as the number of matched pairs to produce.

Lines 7-23 define function recurse(), clearly the meat of this solution.
Function recurse() takes three arguments:

 => an accumumlated string
 => the number of "("s
 => the number of ")"s

The if-block on line 10 decides that the string is "finished" if the number
of open parens equals and the number of close parens both equal $N.
Recursion stops and the solution is returned if this condition is met.

The elsif-block on line 15 and the if-block on line 19 consider two
possibilities:

 * if adding a "(" is still possibly valid, call recurse() on "$string("
 * if adding a ")" is still possibly valid, call recurse() on "$string)"

Line 25 begins the recursion, starting recurse() with arguments representing
an empty string and no open or closed parens.

Several of the submitted solutions used this algorithm, including
Matthew Walton's Haskell solution.

=head2 Another Recursive Solution

[Mark Dominus]

Several programmers observed that the set of balanced strings can be
recursively describes as the smallest set that contains the empty
string and also all strings of the form "(S)S", where S is another
balanced string, and wrote programs to generate balanced strings by
applying this transformation.  Leo Cacciari and Zsban Ambrus's
programs did this, for example.  Here is Pr. Cacciari's program:

        #!/usr/bin/perl -wl
        use strict;

        my $n = shift;

        my @stack = qw(S);

        while (@stack) {
          my $str = pop @stack;
          my $nt = ($str =~ tr!S!S!);
          my $par = (length($str) - $nt)/2;

          if ($par == $n) {
            $str =~ s!S!!g;
            print $str;
          }
          elsif ($nt <= $n) {
            if ($nt > 1) {
              (my $new = $str) =~ s!S!!;
              push @stack,$new;
            }
            $str =~ s!S!(S)S!;
            push @stack,$str;
          }
        }

The program does a depth-first search of the space of all strings
that can be generated from 'S' by turning an 'S' into either '(S)S' or
'':

                        S
                       / \
                          (S)S
                         /    \
                        /      \
                       ()S     ((S)S)S
                       / \       /    \
                      /   \     /      \
                    ()  ()(S)S (()S)S  (((S)S)S)S
                        /  \     |      /        \
                       /    \   ...   ((()S)S)S   ((((S)S)S)S)S 
                      /      \           |              |
                     ()()S ()((S)S)S    ...            ...
                    /   \        |
                ()()   ()()(S)S  ...
                          |
                         ...

Language theory tells us that the order in which the S's are replaced
does not matter, so the program can replace them left-to-right.

The program maintains a stack, @stack, of forms to be investigated;
initially this contains only the base form 'S'.  On each pass, the
program pops the top form from the stack and counts the number of S's
(in $nt) and the number of pairs of parentheses it contains, in $par.

If the number of parentheses is correct for the desired output ($par
== $n) then all remaining S's are replaced with the empty string
(s!S!!g) and the form is printed.  Otherwise, if the form is too big
to possibly produce useful output ($nt > $n) it is discarded.  

If not discarded, the first S is replaced with (S)S, and the resulting
form is pushed onto the stack for later processing.  Also, if the form
has more than one S, the first S in the original form is replaced with
the empty string, and the resulting form is *also* put onto the stack.

cacciari2.pl is a minor variation that avoids recalculating $nt and
$par for each node by storing this information on the stack as well.

Zsban Ambrus's program zsban.rb is a Ruby implementation of a similar
algorithm.  In Pr. Zsban's program, the strings with N pairs of
parentheses are generated by selecting all possible combinations of a
string I of i pairs and a string J of j pairs of parentheses, where

        i+j+1 = N
        0 <= i,j < N

and then generating the string "(I)J".  Pr. Zsban's program keeps
cached lists of all the strings of balanced parentheses of size < N,
for time efficiency.

=head2 Another algorithm

[Mark Dominus]

I had originally planned to do something recursive, but after
tinkering a little I found a really excellent, non-obvious algorithm
which has excellent speed and space performance and an extremely
simple implementation:

        #!/usr/bin/perl -l

        print $_ = "()" x shift;
        print while s{^  ( \(+ )  ( \)+ ) \(  }
                     {"()" x (length($2) - 1)
                      . "("  x (length($1) - length($2) + 2)
                      . ")"
                     }xe;

A drawback of this approach is that it does not generate the output in
lexicographic order; however, a minor variation (modify the right end
of the string instead of the left end) *does* generate the output in
lexicographic order.  Alternatively, piping the output of this program
through

        perl -ple '$_ = reverse; tr/()/)(/'

is a cheap way to put the output into lexicographic order, without sorting.

A detailed explanation of this algorithm is available at

        http://perl.plover.com/~alias/list.cgi?1:mss:2118        

=head2 A Wrong Solution

[John J. Trammell]

I must confess, my first solution to this quiz was wrong.  I was happy to
find I had some company.  :^)

Specifically, I noticed that from the N=2 solution:

  (())
  ()()

generating the N=3 solution:

  ((()))
  (())()
  ()(())
  (()())
  ()()()

was as simple as a map and a filter:

  @n3 = map { ("$_()","($_)","()$_") } @n2;
  @n3 = unique(@n3);

The problem with this approach is seen at N=4, in generating the valid
string:

  (())(())

and similar strings for N=5, etc.

[ A lot of people made this mistake! - MJD ]

=head1 Thanks

[Mark Dominus]

Many thanks to everyone who particpiated in the discussion or who sent
in solutions.  I was not expecting this quiz to be as popular as it
was, or to see so many different solutions.  Many thanks also to John
J. Trammell for his work in testing and timing the submitted programs
and writing the report, and, as always, to the legions of quiet
programmers who did the quiz but didn't post anything about it.

=cut