Sample solutions and discussion
Perl Expert Quiz of The Week #3 (20021030)



        Perl provides a function, 'gethostbyname()', to translate host names
        to their addresses.    For example,

                gethostbyname("perl.plover.com")

        returns the string

                \xd8\x9e\x34\x79

        because that is the packed 4-byte IP address of perl.plover.com.

        Unfortunately, gethostbyname() has a drawback.  It typically consults
        the DNS (domain name system) network service, and resolving a hostname
        make require many network queries.  Sending these queries and
        receiving the answers or waiting for the timeouts can take a lot of
        time, so a Perl program might hang for a long time whenever it does
        'gethostbyname'.

        You will write a replacement for gethostbyname() that does not suffer
        from this drawback; it will be asynchronous.  The replacement will
        have two parts:

        1.  An asynchronous domain lookup server.  This is a program that runs
            continually in the background, waiting for requests from clients
            to look up hostnames.  

            There must be some way for a client process to contact the server
            and send it a hostname.  The server must immediately return some
            sort of token that identifies the query uniquely.  The server then
            performs the gethostbyname() call on behalf of the client.

            There must be some way for the client to find out whether or not
            the address has been obtained, and, if so, what the address is,
            using the previously-returned token to specify which previous
            query it is inquiring about.  Finding out whether and answer is
            ready must be instantaneous.  If the answer is ready, finding out
            the address must be instantaneous.

        2.  A Perl module that provides an interface to the asynchronous
            lookup service.  The interface will look like this:

                use Gethostbyname::Asynchronous;

                my $query = Gethostbyname::Asynchronous->query("perl.plover.com");

                until ($query->is_ready) {
                  # ... do something else ...
                }
                # At this point, the service has located the address for us

                my $address = $query->answer;

            The ->answer method always returns the answer.  If the answer
            isn't ready at the time the call is made, then ->answer waits
            until an answer is available and then returns it.

            The module must allow multiple simultaneous queries:

                for my $name (@names) {
                  $address{$name} = Gethostbyname::Asynchronous->query($name);
                }

                while (...) {
                  # ...do something else for a while...

                  for my $host (keys %address) {
                    if ($address{$host}->is_ready) {
                      print "$host: ", $address{$host}->answer, "\n";
                      delete $address{$host};
                    }
                  }

                  # More other stuff going on here...
                }

            If you're not familiar with OOP techniques in Perl, you may
            implement a functional-style interface instead:

                use Gethostbyname::Asynchronous::Functional;

                my $qid = query("perl.plover.com");

                until (is_ready($qid)) {
                  # ... do something else ...
                }
                # At this point, the service has located the address for us

                my $address = answer($qid);



I was really disappointed by the response to this question.  I thought
it would be perceived as a pertinent, realistic problem, small enough
to be solved quickly, but sitll complete enough to be relevant.
Apparently other folks thought it looked like too much work.  Andreas
Trottmamn was the only person to post a solution to the qotw-discuss
list.

There are a number of techniques one can use to build a server; I was
hoping to be able to compare them.  The client can connect to the
server through a network socket, or through a pipe, or by using shared
memory as a mailbox.  I hoped someone would build a solution around
POE.

Andreas Trottmann's solution involved a network service, running on
port 65432.  His use of IO::Socket made the server code very small:

        #!/usr/bin/perl -w

        use strict;

        use IO::Socket;
        use Gethostbyname::Asynchronous;

        # define a "reaper" for our children (without this, we'll get a lot of zombies)
        $SIG{CHLD} = sub { wait };

        my $sock = new IO::Socket::INET
          (LocalPort => $Gethostbyname::Asynchronous::portno,
           Proto     => 'tcp',
           ReuseAddr => 1,
           Listen    => 5,
          );

        die "Can't create socket: $!\n" unless defined $sock;

        my $client;
        for(;;) {
          if($client = $sock->accept()) {
            my $pid = fork();
            die "Can't fork: $!\n" unless defined $pid;
            unless($pid) { # $pid == 0 means we're in the child process
              chomp(my $hostname = $client->getline);
              print $client scalar gethostbyname $hostname || '';
              exit 0; # this automatically closes the client socket
            }
          }
            # we're in the parent process, don't need *this* client anymore
            undef $client;
        }

Andreas's server forks off a child process to handle each client; this
is a simple and straightforward way to handle each client instantly.
His interface module connects to the server at the well-known port and
immediately returns the resulting filehandle; by doing select() on the
filehandle a client can see whether the answer is ready, and by
reading the handle, the client can get the answer:


        package Gethostbyname::Asynchronous;

        use strict;
        use vars qw(@ISA $portno);
        use IO::Socket::INET;
        use IO::Select;
        @ISA=('IO::Select');

        $portno = 65432; # i hope that's an unoccupied number

        sub query {
          my $sock = IO::Socket::INET->new(PeerAddr => 'localhost',
                                           PeerPort => $portno,
                                          );
          die "Can't connect to the gethostbyname server: $!\n" unless defined $sock;
          $sock->blocking(0);
          print $sock $_[1], "\n";

          my $select = IO::Select->new;
          $select->add($sock);
          return bless $select;
        }

        sub is_ready {
          return scalar $_[0]->can_read(0);
        }

        sub answer {
          my($sock) = $_[0]->can_read;
          my $result;
          $sock->read($result, 4);
          my(@result) = unpack 'CCCC', $result;
          return join('.', @result);
        }

        1; # give a successful return value in any case


My own solution was quite different.  I decided I didn't want to use
sockets.  (Why?  Partly because I thought everyone else would do it;
partly because I didn't want to write the networking code; I wanted to
explore alternative solutions.)  I first considered having the clients
communicate with the server by writing through a named pipe (FIFO) in
a public location.  But pipes only offer half-duplex communication
(you can send a message in either direction, but not in both
directions at once) and it wasn't clear to me how I would get the
server to send the token back to a client without possibly interfering
with the requests from other clients that were also in the pipe at the
same time; also, named pipes are not portable.  So I eventually
decided to use the file system as a mailbox.

The services advertises a directory, defined by the module, to serve
as a mailbox.  I used /tmp/async-dns, but it could be anything at all.
This directory contains two subdirectories, 'q' and 'a'.  Clients make
requests by depositing files in 'q', and can get the answers back by
looking in 'a'.

        package Gethostbyname::Asynchronous;
        $HOME = "/tmp/async-dns";

        my $S = "a";

        sub query {
          my ($pkg, $hostname) = @_;
          my $token = join ".", $$, $S++;
          local *Q;
          open Q, ">", "$HOME/q/$token" or return;
          print Q $hostname;
          bless { token => $token } => $pkg;
        }

The client is responsible for inventing its own token.  I decided this
was permissible, because it's entirely invisible to the user of the
service; the token generation occurs inside the module.  The token is
just the process ID of the process making the request, combined with a
unique per-request identifier.  To make a request, the client deposits
a file in /tmp/async-dns/q whose name is the desired token and which
contains the hostname to be looked up.  The server will eventually
notice this,  and when the answer is ready, it will deposit a file
with the same name into the /tmp/async-dns/a directory; the contents
of the answer file will be the desired address.  The client can watch
for this file:

        sub _ans_file {
          "$HOME/a/$_[0]{token}";
        }

        sub is_ready {
          my $self = shift;
          -e $self->_ans_file;
        }

When the file appears, the client can read it to find out the answer:

        sub answer {
          my $self = shift;
          sleep 1 until $self->is_ready;
          local *A;
          open A, "<", $self->_ans_file
            or return;
          my $result;
          { local $/; $result = <A> }
          close A;
          unlink $self->_ans_file;
          $result;
        }


The server doesn't need to fork; it just runs an infinite loop,
alternately scanning /q for new requests, and depositing answers to
requests into /a.  The main loop looks like:

        while (1) {
          my @queries = find_queries();
          for my $q (@queries) {
            my $addr = gethostbyname($q->{hostname});
            record_answer($q->{token}, $addr);
          }
        }

The complete server is:

        #!/usr/bin/perl
        use Gethostbyname::Asynchronous;

        my $HOME = $Gethostbyname::Asynchronous::HOME;
        my $query_dir = "$HOME/q";
        my $answer_dir = "$HOME/a";
        setup();

        while (1) {
          my @queries = find_queries();
          for my $q (@queries) {
            my $addr = gethostbyname($q->{hostname});
            record_answer($q->{token}, $addr);
          }
        }

        sub find_queries {
          local *D;
          opendir D, $query_dir
            or die "Couldn't open query dir $query_dir: $!; aborting";
          my @new_queries;
          while (my $token = readdir D) {
            local *Q;
            next unless -f "$query_dir/$token";
            next unless open Q, "<", "$query_dir/$token";
            local $/;
            my $hostname = <Q>;
            push @new_queries, {hostname => $hostname,
                                token => $token };
          }

          unlink map "$query_dir/$_->{token}", @new_queries;
          @new_queries;
        }

        sub record_answer {
          my ($token, $addr) = @_;
          open A, ">", "$answer_dir/.tmp" 
            or die "Couldn't open tmp answer file $answer_dir/tmp: $!; aborting";
          print A $addr;
          close A;
          rename "$answer_dir/.tmp", "$answer_dir/$token" 
            or die "Couldn't rename tmp answer file to $answer_dir/$token: $!; aborting";
        }

        sub setup {
          my %perm = ($HOME => 0711,
                      $query_dir => 0733,
                      $answer_dir => 0733,
                     );
          my $old_mask = umask 0000;

          for ($HOME, $query_dir, $answer_dir) {
            if (-d) {
              my $perms = (stat $_)[2] & 0777;
              next if $perms == $perm{$_};
              chmod $perm{$_}, $_ or
                die "Couldn't set permissions on directory $_: $!";
            } elsif (-e) {
              die "$_ exists, but is not a directory!  Aborting.\n";
            } else {
              mkdir $_, $perm{$_}
                or die "Couldn't create directory $_: $!; aborting.\n";
            }
          }

          umask $old_mask;
        }

1. One potential problem here is that clients are responsible for
   cleaning up the answers in the a/ directory when they are done with
   them; if they don't do this, the a/ directory will become full of
   old queries.  Not only would this waste disk space, but it might
   introduce a bug.  Suppose process 11111 makes query 11111.a, which
   the server answers; the answer is in /tmp/async-dns/a/11111.a.
   Process 11111 exits without deleting this file.  Some time later, a
   new process 11111 makes a query, also called 11111.a, and then
   checks a/11111.a before the server has a chance to answer the
   query.  It will see the old a/11111.a file with the wrong answer in
   it!  It would be better to make the server responsible for cleaning
   up the a/ directory, but it can't because it doesn't have any way
   to know when an answer file can be thrown away.  The network socket
   approach does not have this drawback.

2. Another potential problem is security.  Suppose Alice is a bad
   person, and she can guess the tokens that Barbara's process is
   using.  Then she may be able to delete the answer that the server
   has left for Barbara's process and replace it with a different
   address; barbara's process will then have the wrong address and go
   and talk to the wrong host!  Similarly, Alice could delete
   Barabra's process's request before the server sees it, and replace
   it with a request for a different host.  It's not hard to defeat
   this; the library should make the tokens hard to guess by including
   a large random number in each one:

        use Math::TrulyRandom
        my $token = join ".", $$, $S++, unpack("H*", truly_random_value());

3. I used a hash as the basis for the object in the client code:

          bless { token => $token } => $pkg;

   Since the hash has only one element, I didn't need to have a hash
   at all; I might have just blessed $token itself:

          bless \$token => $pkg;

   Then for example, _ans_file would have become:

        sub _ans_file {
          "$HOME/a/${$_[0]}";
        }

   In such cases I usually use a hash for two reasons: I feel that the
   code is clearer (you have the syntactic marker 'token' as a
   signpost wherever the token member data is being used), and also
   because objects often acquire more member data later on, and it's
   easy to add another member to a hash.  Andreas opted to just return
   a reblessed IO::Select object, which is very simple.

4. One person said he didn't do it because he didn't like my
   interface.  He wanted to do it with an event loop with a callback.
   I think that's rather silly.  The interface I specified is very
   generic.  It would be very easy to add an event loop interface
   around it, or to incorporate it into any sort of event loop
   framework.  The reverse is not true.

5. Someone sent me private email, asking if this was a silly quiz
   question, since there is a feature of Net::DNS that makes
   asynchronous queries.  

   I don't think it is silly, for two reasons.  First, the design of
   such a facility is of interest by itself.  (Or at least, I thought
   it would be of interest.)  And second, unlike Net::DNS, it's easy
   to turn this library into a generic asynchonrous computation
   service; just changing a few lines of either my solution or
   Andreas's will make it into a library for asynchronously requesting
   web pages, or asynchronously sending email, or asynchronously
   performing any computation at all:

        while (1) {
          my @queries = find_queries();
          for my $q (@queries) {
            my $result = eval($q->{code});
            record_answer($q->{token}, $result);
          }
        }

   I don't think Net::DNS is going to do this.


Thanks again for your interest.  I will send another quiz tomorrow.

