Mixing Two Tied Hash Classes

Roland Giersig asked in comp.lang.perl.moderated how to have a hash with the indexing properties of the Tie::IxHash class be persistent, and save its values to disk when it was destructed and restored next time.

This really interested me, because it's useful and tricky, and I found several ways to do it. Two of them are variations on what Roland tried originally, and one is different. A lot of interesting things come up along the way, and I was very happy with the articles, although the need editing because I just dashed them off.

Here they are.


Roland's Original Question

Date: Fri, 28 May 1999 16:09:30 +0200
From: Roland Giersig <Roland.Giersig@alcatel.at>
Organization: Organized, who, me?  ;-)
To: clpm@lists.eyrie.org
Subject: Tough one: how to chain ties?
Newsgroups: comp.lang.perl.moderated

Oh ye Perl Gurus,

I'm trying to write a module for tie and want to chain in another module. Both modules implement hashes, so this should be possible in principle.

To be more concrete: I want to write a module that gives persistency to a hash, i.e. it stores the hash data into a file upon untie and restores them upon tie. It would be nice to be able to use another tie package (IxHash, to be specific) for the internal representation of that hash. This is what I call `chaining'.

Example code:

	tie %a, Tie::IxHash;
	tie %b, Tie::PersistentHash, $filename, \%a;
 
	# now work with %b, which is persistent AND order-preserving
 
	untie %b; # stores the data to file

OK, I can dump out the IxHash structure via Data::Dumper and read it back in via eval. But I cannot re-tie the dumped data to the internal PersistentHash hash var. I can copy the data back into a newly tied hash, but I would lose (in the IxHash case) the order of the keys.

What I have this far (excluding error checking etc.):

	sub TIEHASH {
	  my $self = [];
	  $self->[1] = shift; # filename
	  $self->[0] = shift || {}; # chained tied hashref (optional)
	
	  do $self->[1] if -f $self->[1]; # read dumped data
	  $self->[0] = $PersistentHash;  # XXX here is the problem...
	}
	
	sub FETCH    { $_[0]->[0]{$_[1]} }
	sub STORE    { $_[0]->[0]{$_[1]} = $_[2] }
	sub FIRSTKEY { my $a = scalar keys %{$_[0]->[0]}; each %{$_[0]->[0]} }
	sub NEXTKEY  { each %{$_[0]->[0]} }
	sub EXISTS   { exists $_[0]->[0]->{$_[1]} }
	sub DELETE   { delete $_[0]->[0]->{$_[1]} }
	sub CLEAR    { %{$_[0]->[0]} = () }
	
	sub DESTROY {
	  my $self = shift;
	
	  use Data::Dumper;
	  open DB, ">$self->[1]";
	  if (tied $self->[0]) {
	    print DB Data::Dumper->Dump([tied %{$self->[0]}], [qw(PersistentHash)]);
	  } else {
	    print DB Data::Dumper->Dump([$self->[0]], [qw(PersistentHash)]);
	  }
	  close DB;
	  # dumped data file looks like:
	  # $PersistentHash = bless({
	  #   'key' => 'data',
	  #   ...
	  # }, 'Tie::IxHash');
	}

The line with the XXX is the problem: here I would like to re-tie the dumped data struct to the internal hash ($self->[0]).

The problem is I cannot find a way to get the object data of the IxHash back into the tied variable.

Naively, I tried

	my %a;
	tie %a, $PersistentHash;
	$self->[0] = \%a;

which would give an error:

	Can't locate object method "FIRSTKEY" via package "Tie::IxHash=ARRAY(0x10ce48)"

Then I tried

	tie %a, ref($PersistentHash);

which ties the variable to the right package, but the hash is empty.

--- Now how can I move those data back into the tied object? ---

It would be nice if

	tie %a, $xref;

would do the right thing, i.e. tie the variable to the module $xref is blessed into and initialize it with the contents of $xref.

Perl Gods, maybe in v5.006?

Any suggestions?

Roland

--
Roland.Giersig@alcatel.at         Phone: +43-1-27722-3755
ALCATEL Austria, Dept. RTPM       FAX:   +43-1-27722-3955
Scheydgasse 41, A-1210 WIEN, Austria (no Kangaroos here!)

Solutions #1 and #2

To: clpm@lists.eyrie.org
Subject: Re: Tough one: how to chain ties?
Date: Fri, 28 May 1999 13:17:33 -0400
From: Mark-Jason Dominus <mjd@plover.com>
Newsgroups: comp.lang.perl.moderated

>  tie %a, Tie::IxHash;
>  tie %b, Tie::PersistentHash, $filename, \%a;

Oh, this is an interesting approach.


> sub TIEHASH {
>   my $self = [];
>   $self->[1] = shift; # filename
>   $self->[0] = shift || {}; # chained tied hashref (optional)
> 
>   do $self->[1] if -f $self->[1]; # read dumped data
>   $self->[0] = $PersistentHash;  # XXX here is the problem...
> }

The first thing I notice here is that you have the arguments to TIEHASH wrong. The first argument will be the class name, not the filename. So you needed to have


> sub TIEHASH {
>   my $self = [];
    my $class = shift;
>   $self->[1] = shift; # filename
>   $self->[0] = shift || {}; # chained tied hashref (optional)
> 
>   do $self->[1] if -f $self->[1]; # read dumped data
>   $self->[0] = $PersistentHash;  # XXX here is the problem...
    bless $self => $class;
> }

I guess you got this right in your implementation and forgot to include it only in the news article? Because actually I think it needs to look more like this:


> sub TIEHASH {
>   my $self = [];
    my $class = shift;
>   $self->[1] = shift; # filename
>   $self->[0] = shift || {}; # chained tied hashref (optional)
> 
    if (-f $self->[1]) {
      do $self->[1];
>     $self->[0] = $PersistentHash;  # XXX here is the problem...
    }
    bless $self => $class;
> }

But OK, your real problem is that you can read in the Tie::IxHash object that you saved, and you can reinstate it as a Tie::IxHash object, but you can't reassociate it with the %a hash.

There are a couple of ways around it. One way is that the object doesn't actually have to be tied to anything to function as an interface to a tied hash class. Forget about %a completely, and just use the IxHash object directly. Then to set it up would look like this:

	my $a = (tie {} => Tie::IxHash);
	tie %b => Tie::PersistentHash, $filename, $a;

Now instead of this:

> sub FETCH    { $_[0]->[0]{$_[1]} }

use this:

  sub FETCH    { $_[0]->[0]->FETCH($_[1]) }

and instead of this:

> sub STORE    { $_[0]->[0]{$_[1]} = $_[2] }

use this:

  sub STORE    { $_[0]->[0]->STORE($_[1], $_[2]) }

This is probably the easiest thing to do. In this case your TIEHASH method will look like this:


> sub TIEHASH {
>   my $self = [];
    my $class = shift;
>   $self->[1] = shift; # filename
> 
    if (-f $self->[1]) {
      do $self->[1];
>     $self->[0] = $PersistentHash;  # XXX no problem any more!
    } else {
      $self->[0] = Tie::IxHash->TIEHASH()  # LOD!
    }
    bless $self => $class;
> }

That is a lot to absorb, and I think it will solve your problem, so you might want to put this message aside now and come back to the rest of it later if you are still interested.

Another, totally different strategy is to make a class just like Tie::IxHash, except that instead of creating a new object from scratch, it will let you specify and old object as an initializer. Then you can read the old object out of the file with Data::Dumper and hand it off to the initializer for the new class.

You would do that like this:

	package My::IxHash;
	use Tie::IxHash;

	sub TIEHASH {
	  my $package = shift;
	  my $old_object = shift;
	  # It's already constructed, so just return it!
	  $old_object;		# LOD! 

	}

	1;

Now the TIEHASH method for Tie::PersistentHash will look like this:


> sub TIEHASH {
>   my $self = [];
    my $class = shift;
>   $self->[1] = shift; # filename
> 
    if (-f $self->[1]) {
      do $self->[1];
      my %inner_hash;
      # Attach inner hash to old Tie::IxHash object
      tie %inner_hash => My::IxHash, $PersistentHash;
      $self->[0] = \%inner_hash;
    } elsif (@_) {
      $self->[0] = shift;    # Use tied hash (\%a) supplied in `tie' call
    } else {
      my %inner_hash;
      tie %inner_hash => Tie::IxHash; # Create a new tied hash 
      $self->[0] = \%inner_hash;      # and use it
    }
    bless $self => $class;
> }
> It would be nice if 
> 
>  tie %a, $xref;
> 
> would do the right thing, i.e. tie the variable to the module
> $xref is blessed into AND initialize it with the contents of $xref.

That's a nice idea, and I can't see anything wrong with it, except that it might be difficult to explain what it was for.

> Perl Gods, maybe in v5.006?

Patches welcome.

> Any suggestions?

Hope this helps.


Digression on a very different solution

Before I had read the details of what you wanted, I wrote a long article about a totally different way to do it, which may be useful, or at least intersting, so it follows here:

I'm trying to write a module for tie and want to chain in another module. Both modules implement hashes, so this should be possible in principle. To be more concrete: I want to write a module that gives persistency to a hash, i.e. it stores the hash data into a file upon untie and restores them upon tie. It would be nice to be able to use another tie package (IxHash, to be specific) for the internal representation of that hash. This is what I call `chaining'.

That seems backwards to me. You have two sets of features you want to combine:

  1. Persistence (Let's suppose that you want NBDM_File semantics.)
  2. Indexedness (Let's suppose that you want Tie::IxHash.)

The `persistent' part is implemented with a lot of complicated C code and there is a lot of complicated stuff going on under the hood.

The `indexedness' part is implemented in Perl using a bunch of Perl data structures.

You can either add indexedness to NDBM_File, or add persistence to Tie::IxHash.

One good way to add features to something in Perl is to use the tie facility. This adds an extra layer of semantics to a perl data structure. It makes sense to do this to Tie::IxHash because it is implemented in perl with Perl data structures. It is conceivable that by magicking the data structures in Tie::IxHash, you could make them persistent.

But the data structures used to implement NDBM_File are not Perl data structures; they are in C, and it is impossible to tie them. So it is probably hopeless to expect to make NDBM_File indexed.

So that is why when you said

It would be nice to be able to use another tie package (IxHash, to be specific) for the internal representation of that hash.

that it sounded backwards to me.

To lift the argument up to a more abstract level: Persistence is not a very big interface change. The interface to a persistent hash is exactly the same as the interface to any other hash.

But indexedness is a very large interface change. There all all sorts of new operations that you have to support, such as store/fetch by index, push, pop, and so on.

You are going to layer together three pieces of software:

	Your program
	----------------------top interface
	Hash module 1
	----------------------bottom interface
	Hash module 2

To minimize the amount of work you have to do, you want the interface change to be at the top interface. Otherwise, both hash modules will have to know about it. So you want to put Tie::IxHash on top:

	Your program
	---------------------- interface for indexedness
	Tie::IxHash
	---------------------- interface for persistence
	NDBM_File

This seems plausible, because the interface for persistence is going to be exactly like the regular hash/array interface that Tie::IxHash was going to use anyway.

To do this, you need to arrange that the internal data structures that Tie::IxHash uses become persistent. This is not too hard. The inside of a Tie::IxHash object is a hash, a scalar, and two arrays. If you can find a persistent hash class and a persistent array classes, you can tie the hash and the arrays and then the IxHash object will be persistent.

I did do this, and it was very easy, and it almost worked. The reson it did not work is because I used DB_File $DB_RECNO as the persistent array class, and it has a bug and dumps core. (Note: It later turned out that this bug was fixed in DB_File 1.65 and when I upgraded, this solution worked perfectly.) Here is the code:

	package Persistent::IxHash;
	use Tie::IxHash;
	@ISA = qw(Tie::IxHash);
	use DB_File;
	
	sub TIEHASH {
	  my $c = shift;
	  my $f = shift;
	  my($s) = [];
	  my $p = DB_File;
	  my (%keyhash, @keys, @values);
	  tie %keyhash => $p, $f,     @_, $DB_HASH  or return;
	  tie @keys    => $p, "$f.k", @_, $DB_RECNO or return;
	  tie @values  => $p, "$f.v", @_, $DB_RECNO or return;
	  
	  $s->[0] = \%keyhash;   # hashkey index
	  $s->[1] = \@keys;      # array of keys
	  $s->[2] = \@values;    # array of data
	  $s->[3] = @keys;       # iter count
	
	  bless $s, $c;
	
	  return $s;
	}

	# All other methods are inherited from Tie::IxHash
	
	1;

This TIEHASH method is just like the one in Tie::IxHash. It constructs an object $s which contains a hash and two arrays. I took the code from from Tie::IxHash and changed it to tie the hash and the arrays before returning the object. This object will inherit its FETCH and STORE and other methods from Tie::IxHash, and Tie::IxHash will look into the object and see the hash and the two arrays as it should. It does not need to know that they are persistent.

To use this, you say

	use Persistent::IxHash;
	use Fcntl;
	my $o = tie %h => Persistent::IxHash, $filename, $flags, $mode;

and then $o and %h behave just like a Tie::IxHash object. Or they would, except that DB_File dumps core. But if you can find a different persistent array class that is not broken it will work just fine. It would not be too hard to write one. You could build it on top of a persistent hash by using record numbers as keys.


Other work in similar areas

  1. I just released Tie::HashHistory, which is a tied hash layer that you can insert between your own program and any other tied hash class. Everything looks completely ordinary, but you can also ask HashHistory for the history of a key. It will return a list of all the values that the key has ever had, in order.

    http://www.plover.com/~mjd/perl/HashHistory/

  2. At the Boston O'Reilly tutorials, Mike Cariaso and I figured out how to fix the Memoize module to expire cached values according to an arbitrary policy.

    I did not want to put this feature into the core of Memoize because it would slow it things even for people who did not want this feature, and there are a lot of possible policies for expiration, so I was afraid that it would grow monstrously large.

    But instead we figured out that it was possible to write a tied hash module to implement the expiration policy, and then tell Memoize to tie its caches into this package. If you want the caches to be persistent, you build your expiration policy module to be a layer in between Memoize and the class that implements persistence. You could even stack these policy modules on top of one another:

    	Your program
    	------------
    	Memoize
    	------------
    	Discard cached items that are too old
    	------------
    	Manage LRU cache with maximum size
    	------------
    	NDBM_File
    	------------
    	Cache resides on disk
    

    Memoize is at http://www.plover.com/~mjd/perl/Memoize/ but the expiration stuff is not in there yet---it will appear in the next version.

As you can see this is a subject I find very interesting.


Additions and emendations

From: Mark-Jason Dominus 
Date: Fri, 28 May 1999 22:43:42 -0400 (EDT)
Newsgroups: comp.lang.perl.moderated
Subject: Re: Tough one: how to chain ties?
Organization: Plover Systems

There were various minor errors, so in hopes of clarifying, Here are a couple of corrections:

>	my $a = (tie {} => Tie::IxHash);
>	tie %b => Tie::PersistentHash, $filename, $a;

This won't work; you'd have to do something like

	my $a = do { my %a; tie %a => Tie::IxHash };
	tie %b => Tie::PersistentHash, $filename, $a;

The goal here being to use the tie on something you can get rid of right away. But actually the %a is irrelevant here, and what you're really after is the associated object. So if you take this first approach, you might as well just call the constructor directly and say

	my $a = Tie::IxHash->TIEHASH();
	tie %b => Tie::PersistentHash, $filename, $a;

But then actually, although I didn't say so, if you use the Tie::PersistentHash::TIEHASH method I showed, you don't need to bother with $a at all; the Tie::PersistentHash::TIEHASH function is quite capable of coming up with its own Tie::IxHash object if it needs one:

>> sub TIEHASH {
> ...
>    if (-f $self->[1]) {
>      do $self->[1];
>>     $self->[0] = $PersistentHash;  # XXX no problem any more!
>    } else {
>      $self->[0] = Tie::IxHash->TIEHASH()  # LOD!
>    }
> ...
>> }

>That is a lot to absorb, and I think it will solve your problem, so >you might want to put this message aside now and come back to the rest >of it later if you are still interested.

I wrote that, and then the second method turned out to be easier, but I did not think to go back and change it. That is why we have editors. (I mean the people here, and not the programs.)

>	package My::IxHash;
>	use Tie::IxHash;
>
>	sub TIEHASH {
>	  my $package = shift;
>	  my $old_object = shift;
>	  # It's already constructed, so just return it!
>	  $old_object;		# LOD!
>	}

There is a subtlety here, which is why the line is marked with LOD. (`Lod' is the Hungarian word for `subtle feature'.) The function appears to be a constructor for the My::IxHash package, but it does not construct My::IxHash objects. Instead, it constructs Tie::IxHash objects. That is probably a violation of class boundaries or something, but I don't care; it saves having to do inheritance in a null subclass, which is a waste of time.

If it bothers you, a solution is to add your own constructor to the existing Tie::IxHash package. You can do this by putting the following into your main program (or anywhere else):

	sub Tie::IxHash::newconstructor {
	   # ... as before ..
	}

This is probably an even grosser violation of class boundaries, but if you won't tell I won't either.


Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia

mjd-perl-questions@plover.com