Sample solutions and discussion Perl Quiz of The Week #23 (20040901) =head1 NAME qotw-r23.pod - Summary of the Perl Quiz of the Week, #23 (regular) =head1 SYNOPSIS The following programmers sent solutions to "regular" Perl Quiz #23 to the perl-qotw-discuss mailing list: Rod Adams Kester Allen Greg Bacon (5 solutions) Roger Burton West Leo Cacciari (2 solutions) Tom Coleman Jereme Corrado Andrew Dalke (Python) José Alves de Castro Mark Jason Dominus Christian Dühl (2 solutions) Darren Dunham (2 solutions) Kevin Earls Shlomi Fish Bruce J. Keeler (3 solutions, 1 in C) Ronald J. Kimball Zed Lopez (2 solutions) Muir Manders (2 solutions) Daniel Martin (3 solutions) James Mastros Xavier Noria (2 solutions) Yitzchak Scott-Thoennes (4 solutions) Ariel Shaqed (2 solutions) John J. Trammell Bill Tucker (2 solutions) Matthew Walton (2 solutions, 1 in Haskell) Zsban Ambrus (Ruby) The bulk of the solutions submitted were in Perl. There was one submission in C, one in Python, one in Ruby, and one in Haskell. All tests of Perl code were run on a 2.4 GHz Intel Celeron running Debian (sarge) GNU/Linux . The Perl version used was 5.8.4. All code used in creating this summary is available for download; see http://perl.plover.com/qotw/misc/r023/ =head1 THE QUIZ The quiz text reads: Write a program, 'parens', which gets a command line argument, 'n', which is an integer. The program should print all the properly-balanced strings of parentheses of length 2n. For example, given the argument '3', the program should print these five lines: ((())) (()()) (())() ()(()) ()()() in some order. (The order is not important.) For the argument '1'; the output should be: () and for argument '4', the output should be: (((()))) ((()())) ((())()) ((()))() (()(())) (()()()) (()())() (())(()) (())()() ()((())) ()(()()) ()(())() ()()(()) ()()()() in some order. =head1 RESULTS =head2 Correctness Code output was judged on four criteria: The solution output lists only balanced strings ("B"). The solution output lists the correct number of strings ("N"). The solution output lists the strings in lexical order ("L"). The solution output lists only unique strings ("U"). Criterion "L" is optional, but included as a talking point. Programs that did not compile, or didn't approximate the invocation and the output described in the problem text were skipped. The results for the Perl code are: adams.pl skip allen.pl B N U bacon1.pl B L U bacon2.pl B N L U bacon3.pl B N L U bacon4.pl B N U bacon5.pl B N U burton_west.pl B N L U cacciari1.pl B N L U cacciari2.pl B N L U coleman.pl B L U corrado.pl B N L U dalke.py B N L U decastro.pl B L U demerphq.pl skip dominus.pl B N U duehl1.pl B N L U duehl2.pl B N L U dunham1.pl B N U dunham2.pl B N L U earls.pl L U fish.pl B N L U keeler1.pl B N U keeler2.pl B N L U kimball.pl B N L U lopez1.pl B N L U lopez2.pl B N L U manders1.pl B N U manders2.pl B N U martin1.pl B N U martin2.pl B N U martin3.pl B N L U mastros.pl B N L U noria1.pl B N U noria2.pl B N L U scott_thoennes1.pl B N U scott_thoennes2.pl B N U scott_thoennes3.pl B N U scott_thoennes4.pl B N U shaqed1.pl B N U shaqed2.pl B N U trammell.pl B N L U tucker1.pl B N L U tucker2.pl B N L U walton2.pl B N U The results for the non-Perl code are: dalke.py skip keeler3.c B N L U walton1.lhs skip zsban.rb B N L U =head1 TIMING The following timing results represent the time in seconds for each script to generate the complete output for N=0 through N=12 three times. Programs that generated incorrect output are omitted. shaqed1.pl 4.486572 martin3.pl 4.496771 shaqed2.pl 4.584533 mastros.pl 6.25906 dominus.pl 6.568825 scott_thoennes2.pl 6.592227 keeler2.pl 9.353995 scott_thoennes3.pl 9.723624 martin1.pl 10.399679 manders2.pl 10.489536 fish.pl 10.569164 manders1.pl 10.926351 walton2.pl 14.059723 lopez3.pl 15.483148 tucker1.pl 16.3078 lopez1.pl 17.085956 noria2.pl 18.488001 dalke.py 19.863028 cacciari1.pl 20.803137 tucker2.pl 23.346204 cacciari2.pl 23.992838 burton_west.pl 27.436623 scott_thoennes1.pl 28.85522 trammell.pl 35.020822 bacon5.pl 46.568366 lopez2.pl 47.324535 keeler1.pl 53.49667 noria1.pl 58.954158 dunham2.pl 60.137515 kimball.pl 62.261176 allen.pl 82.689522 bacon4.pl 85.976964 scott_thoennes4.pl 96.830828 duehl2.pl 115.1553 corrado.pl 133.964715 duehl1.pl 147.731797 dunham1.pl 259.267496 martin2.pl 1263.874676 bacon2.pl 1616.085711 =head1 ANALYSIS The many solutions included * "brute force" algorithms that produced all combinations of ) and (, then filtered out invalid strings * more subtle "brute force" algorithms based on combinations of known valid strings of shorter length * recursive solutions that built valid strings from left to right, or built long valid strings by combining shorter ones * more interesting, subtle, or complicated solutions, that I'll need more time to digest A straightforward recursive solution was posted by Shlomi Fish (fish.pl): 1 #!/usr/bin/perl -w 2 3 use strict; 4 5 my $N = shift; 6 7 sub recurse 8 { 9 my ($string, $num_opened, $num_closed) = (@_); 10 if (($num_opened == $N) && ($num_closed == $N)) 11 { 12 print "$string\n"; 13 return; 14 } 15 elsif ($num_opened < $N) 16 { 17 recurse("$string(", $num_opened+1, $num_closed); 18 } 19 if ($num_opened > $num_closed) 20 { 21 recurse("$string)", $num_opened, $num_closed+1); 22 } 23 } 24 25 recurse("", 0, 0); 26 Lines 1-5 set the stage, turning on strictures and warnings, and establishing $N as the number of matched pairs to produce. Lines 7-23 define function recurse(), clearly the meat of this solution. Function recurse() takes three arguments: => an accumumlated string => the number of "("s => the number of ")"s The if-block on line 10 decides that the string is "finished" if the number of open parens equals and the number of close parens both equal $N. Recursion stops and the solution is returned if this condition is met. The elsif-block on line 15 and the if-block on line 19 consider two possibilities: * if adding a "(" is still possibly valid, call recurse() on "$string(" * if adding a ")" is still possibly valid, call recurse() on "$string)" Line 25 begins the recursion, starting recurse() with arguments representing an empty string and no open or closed parens. Several of the submitted solutions used this algorithm, including Matthew Walton's Haskell solution. =head2 Another Recursive Solution [Mark Dominus] Several programmers observed that the set of balanced strings can be recursively describes as the smallest set that contains the empty string and also all strings of the form "(S)S", where S is another balanced string, and wrote programs to generate balanced strings by applying this transformation. Leo Cacciari and Zsban Ambrus's programs did this, for example. Here is Pr. Cacciari's program: #!/usr/bin/perl -wl use strict; my $n = shift; my @stack = qw(S); while (@stack) { my $str = pop @stack; my $nt = ($str =~ tr!S!S!); my $par = (length($str) - $nt)/2; if ($par == $n) { $str =~ s!S!!g; print $str; } elsif ($nt <= $n) { if ($nt > 1) { (my $new = $str) =~ s!S!!; push @stack,$new; } $str =~ s!S!(S)S!; push @stack,$str; } } The program does a depth-first search of the space of all strings that can be generated from 'S' by turning an 'S' into either '(S)S' or '': S / \ (S)S / \ / \ ()S ((S)S)S / \ / \ / \ / \ () ()(S)S (()S)S (((S)S)S)S / \ | / \ / \ ... ((()S)S)S ((((S)S)S)S)S / \ | | ()()S ()((S)S)S ... ... / \ | ()() ()()(S)S ... | ... Language theory tells us that the order in which the S's are replaced does not matter, so the program can replace them left-to-right. The program maintains a stack, @stack, of forms to be investigated; initially this contains only the base form 'S'. On each pass, the program pops the top form from the stack and counts the number of S's (in $nt) and the number of pairs of parentheses it contains, in $par. If the number of parentheses is correct for the desired output ($par == $n) then all remaining S's are replaced with the empty string (s!S!!g) and the form is printed. Otherwise, if the form is too big to possibly produce useful output ($nt > $n) it is discarded. If not discarded, the first S is replaced with (S)S, and the resulting form is pushed onto the stack for later processing. Also, if the form has more than one S, the first S in the original form is replaced with the empty string, and the resulting form is *also* put onto the stack. cacciari2.pl is a minor variation that avoids recalculating $nt and $par for each node by storing this information on the stack as well. Zsban Ambrus's program zsban.rb is a Ruby implementation of a similar algorithm. In Pr. Zsban's program, the strings with N pairs of parentheses are generated by selecting all possible combinations of a string I of i pairs and a string J of j pairs of parentheses, where i+j+1 = N 0 <= i,j < N and then generating the string "(I)J". Pr. Zsban's program keeps cached lists of all the strings of balanced parentheses of size < N, for time efficiency. =head2 Another algorithm [Mark Dominus] I had originally planned to do something recursive, but after tinkering a little I found a really excellent, non-obvious algorithm which has excellent speed and space performance and an extremely simple implementation: #!/usr/bin/perl -l print $_ = "()" x shift; print while s{^ ( \(+ ) ( \)+ ) \( } {"()" x (length($2) - 1) . "(" x (length($1) - length($2) + 2) . ")" }xe; A drawback of this approach is that it does not generate the output in lexicographic order; however, a minor variation (modify the right end of the string instead of the left end) *does* generate the output in lexicographic order. Alternatively, piping the output of this program through perl -ple '$_ = reverse; tr/()/)(/' is a cheap way to put the output into lexicographic order, without sorting. A detailed explanation of this algorithm is available at http://perl.plover.com/~alias/list.cgi?1:mss:2118 =head2 A Wrong Solution [John J. Trammell] I must confess, my first solution to this quiz was wrong. I was happy to find I had some company. :^) Specifically, I noticed that from the N=2 solution: (()) ()() generating the N=3 solution: ((())) (())() ()(()) (()()) ()()() was as simple as a map and a filter: @n3 = map { ("$_()","($_)","()$_") } @n2; @n3 = unique(@n3); The problem with this approach is seen at N=4, in generating the valid string: (())(()) and similar strings for N=5, etc. [ A lot of people made this mistake! - MJD ] =head1 Thanks [Mark Dominus] Many thanks to everyone who particpiated in the discussion or who sent in solutions. I was not expecting this quiz to be as popular as it was, or to see so many different solutions. Many thanks also to John J. Trammell for his work in testing and timing the submitted programs and writing the report, and, as always, to the legions of quiet programmers who did the quiz but didn't post anything about it. =cut