Perl Lines of the Day from October, 1998

7 October, 1998


   $t = $t->[$n>$a->[$t->[0]]?1:2] while ref $t;

Before we dive into this monster, let's see it with more appropriate indentation and whitespacing:

          $t = $t->[$n   >   $a->[$t->[0]]
                         ? 1
                         : 2
                    ]
            while ref $t;

$t is the key item here. It's actually a tree structure. It's a reference to a list of three elements. The first element is an array index, and the other two are the left and right subtrees; they might be numbers, or they might be more lists. $t is initialized with a fairly complicated tree. The tree is an encoded set of instructions about how to do a binary search on the array @$a.

Binary search is the kind of search you do when you're looking something up in the dictionary. You open the book to the middle, and look to see if the word you want occurs in the first or the second half of the book. Suppose it's in the first half; you then split the first half into two parts, and see if the word you want occurs in the first half of the first half or the second half of the first half. You keep doing narrowing down the search until you find the word you want. Of course, you can only do this because the dictionary is in sorted order. In our case, we assume that @$a is sorted numuerically in decreasing order.

Here we're searching the array @$a, trying to locate the number $n in it. @$a is assumed to have 20 elements, numbered 0 through 19. We'll start by looking at element #9; if $n is greater than $a->[9], we'll look in the first half of @$a, in elements 0-9; otherwise we'll look in the second half of @$a, in elements 10-19. (Remember that @$a is in decreasing order, so that elements 0-9 are larger than elements 10-19.)

Here's how $t works: It is an array of three things. The first item is a number that says what element of @$a to look it; in this case it's 9. The second and third items contain instructions about how to continue the search: how to continue if $n is greater than the indicated element of @$a, or how to continue if it's less than or equal to the indicated element of @$a, respectively.

An instruction might be a number, in which case the condition of the while in the Line of the Day is false, and the loop ends. A number means that the search is over and that $n should be spliced in at the indicated position of @$a. If an instruction isn't a number, then it is another three-element list. In this case the while loop (and the search) continues, examining whatever element of @$a is indicated by the first item and then following the instructions in the appropriate one of the other two items.

Here it is again:

          $t = $t->[$n   >   $a->[$t->[0]]
                         ? 1
                         : 2
                    ]
            while ref $t;

$t is a three element list; we get the first element, $t->[0] which is an index into @$a, and we compare the indexed element $a->[$t->[0]] with $n, which is what we're looking for. Depending on the outcome of the comparison, we select either item 1 or 2 as the part of $t that contains the next instruction. We get that instruction and assign it to $t. If it's another list (ref $t), we continue the loop; if it's just a number, ref $t is false, we're done, and we exit the loop.

Now that you know that, here's the complete function that uses the Line of the Day:

     sub insert {
          my $t=[9,[4,[2,[0,0,1],[3,[2,2,3],4]],
                      [7,[5,5,6],[8,[7,7,8],9]]
                   ],
                   [14,[12,[10,10,11],[13,[12,12,13],14        ]],
                       [17,[15,15,16],[18,[17,17,18],[19,19,20]]]
                   ]
                ];
          my($n,$a) = @_;
          $t = $t->[$n>$a->[$t->[0]]?1:2] while ref $t;  # LOD
          splice(@$a, $t, 0, $n);
          pop @$a;
        }

This function accepts two arguments: A number $n and a reference to an array with 20 numbers sorted into descending order. It then inserts $n into the array in the appropriate position, removes the new smallest element from the array, and returns it. It therefore maintains an array of the 20 largest values seen.

The initialization of $t is essential. As you can see, it says to look at element #9 (the middle) first, and next at either #4 or #14. It was constructed carefully to yield a correct binary search of an array with 20 elements.

You could of course just scan the array, looking for the first element in the array that is smaller than the number you want to insert. This is called linear search:

     sub insert {
          my $t = 0;
          my($n,$a) = @_;
          $t++ while $n < $a->[$t];
          splice(@$a, $t, 0, $n);
          pop @$a;
        }

What's the difference? Think about linear-searching the dictionary: You're looking for the word `ichor'. You start on page 1 and look at every word on page 1. Nope, no `ichor'. Try page 2. Nope! Move on to page 3.... Eventually, you get to `ichor'. Aha!

Even for this little array of only 20 elements, the binary search examines about 4.5 array elements per insertion, whereas the linear search examines 10.5. For larger arrays, the win is much bigger.

Of course, I could have picked a less obfuscated way to write the binary search. I wrote this partly as an amusement in a Usenet post, but also because I wanted to investigate this technique for doing binary search. I think the large static initializer for $t makes it impractical, unfortunately.


Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia | Line of the Day

mjd-perl-lotd@plover.com