Consider this data structure:

      [
	[Chi, Ill],
	[NY, NY],
	[Alb, NY],
	[Spr, Ill],		DS1
	[Tr, NJ],
	[Ev, Ill],
      ]

This could be transformed several ways:

      {
	Chi => Ill,
	NY => NY, 
	Alb => NY,
	Spr => Ill,		DS2
	Tr => NJ,
	Ev => Ill,
      }

      { Ill => [Chi, Spr, Ev],
	NY  => [NY, Alb],	DS3
	NJ  => Tr,
      }
      { Ill => 3,
	NY  => 2,
	NJ  => 1,
      }

      [ Chi, Ill, NY, NY, Alb, NY, Spr, Ill, Tr, NJ, Ev, Ill]    DS4


Basic idea:  Transform original structure of nesting depth N into an
N-dimensional table.  If Nth nest is a hash, index table ranks by hash
keys; if an array, index by numbers.  So for example, DS1 becomes

	    1	2
	1   Chi Ill
	2   NY  NY
	3   Alb NY
	4   Spr Ill
	5   Tr  NJ
	6   Ev  Ill


Or maybe hashes should be handled a little differently?  The original
basic idea was more about DS2 and transformed it into

		Ill	NY	NJ
	Chi     X
	NY              X
	Alb             X
	Spr     X
	Tr                      X
	Ev      X

Maybe the rule is: For hashes, use a boolean table indexed by keys and
values; for arrays, use a string table index by integers.

Notation idea:   Assign names to the dimensions of the table, say X
and Y.  Then denote transformations by:

	[([X, Y])]	(DS1)
	{(X => Y)}	(DS2)
	{X => [Y]}	(DS3)
	[(X, Y)]	(DS4)

The (...) are supposed to incdicate a chaining of elements within the
larger structure.  But maybe this isn't right.

At the bottom: How do we say whether

	X=>Y, X=>Z

turns into

	[ X => Y, X => Z ]		(chaining)
or	[ X => [Y, Z] ]			(accumulation)


Consider

	
	  A B C
	D . . .
	E . .
	F .   .


<...> means ITERATE over the thing inside and make a list of the
results.  It's basically `map'.  

Note that:

	<X,Y>     |=   (D,A,D,B,D,C,E,A,E,B,F,A,F,C)
	<X,[<Y>]> |=   (D,[A,B,C],E,[A,B],F,[A,C])

Brackets and braces just mean brackets and braces.  Variables at the
same level of nesting imply a loop over the cartesian join.  Variables
subnested imply a nested loop.  So:

	<X,Y> means

	for x in X
	  for y in Y
	    push @result, (x,y) if present(x,y);

But

	<X,<Y>> means
	for x in X
	  for y in Y
	    push @yresult, (y) if present(x,y);
          push @result, @yresult 

Hmmm.  Maybe there's a better syntax for this.

Well, with this plan:

	DS1: [ <[X,Y]> ]
	DS2: { <X=>Y>  }
	DS3: { <X => [<Y>]> }
	DS4: [ <X, Y> ]

It seems pretty flexible.  You could just as easily write

	{ <X => max(<Y>) }
	  
and you'd get

	{ D => C, E => B, F => C }

If there's a `count' function, you can get

	{ D => 3, E => 2, F => 2 }

or maybe we'll just overload `scalar' to mean `count'.


Question:  How to invert this process?  That's important so that you
can ask it to convert one data structure to another.
Also, then you could write something like

	[ <city, state> ]  |= { <state => [<city>] > }

and omit the X's and Y's.

Real example:  From proddir.  Given

	ID / NAME / SHADE / PALETTE / DESC

For example:

	
	A / AAA / red     / pink   / Aaa
	B / BBB / yellow  / tawny  / Bbb
	A / AAA / green   / nude   / Aaa
	B / BBB / blue    / violet / Bbb
	C / CCC / black   / nude   / Ccc

Turn this into

	{ A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
	  B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
	  C => [ CCC, [ [black, nude] ], CCC]
	}


	{ < ID => [
		     name,
		     [ <[shade, palette]> ]
		     desc
		  ]>
	}


Something interesting happened here.  Suppose we have

	[ [A, B]
	  [A, B]
	]

And we ask for <A, B>.  Do we get (A, B, A, B), or just (A, B)?  Does
it remove duplicate items for us or not?  We might want either.

In the example above, why didn't we get

	{ A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
	  A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
	  B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
	  B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
	  C => [ CCC, [ [black, nude] ], CCC]
	}


If the outer iteration was supposed to be over all id-name-desc triples?
Maybe we need

	<...>		all triples
	<!...!>		unique triples only

Then you could say

	<X>  |= <!X!>

to indicate that you want to uniq a list.

But maybe the old notation already allowed this:

	<X> |= keys %{< X => 1 >}

It's still unclear how to write the example above, which has unique
key-triples.  But it's in a hash, so it gets uniqed on ID anyway;
maybe that's all we need.

================================================================

19991023

Rather than defining some bizarre metalanguage to describe the
transformation, it might be easier all around if the user just enters
a sample input, a sample desired output, and lets the twingler figure
out what to do.  Certainly the parser and internal representation will
be simpler.

For example:

	
	[ [ A, B ],
	  [ C, B ],
	  [ D, E ] ]
	--------------
	{ B => [A, C],
	  E => [D],
	}

should be enough for it to figure out that the code is:

	for my $a1 (@$input) {
	  my ($e1, $e2) = @$a1;
	  push @{$output{$e2}}, $e1;
	}

Advantage:  After generating the code, it can run it on the sample
input to make sure that the output is correct; otherwise it has a bug.

Input grammar:

	%token ELEMENT
	expr: array | hash ;
	array: '[' csl ']' ;
	csl: ELEMENT | ELEMENT ',' csl | /* empty */ ;
	hash: '{' cspl '}' ;
	cspl: pair | pair ',' cspl | /* empty */ ;
	pair: ELEMENT '=>' ELEMENT;

Simple enough.  Note that (...) lines are not allowed.  They are only
useful at the top level.  A later version can allow them.  It can
replace the outer (...) with [...] or {...] as appropirate when it
sees the first top-level separator.  (If there is a => at the top
level, it is a hash, otherwise an array.)

Idea for code generation:  Generate pseudocode first.  Then translate
to Perl.   Then you can insert a peephole optimizer later.  For
example

	foreachkey k (somehash) {
	  push somearray, $somehash{k}
	}

could be optimized to

	somearray = values somehash;

add into hash:  as key, add into value, replace value
add into array: at end only


How do we analyze something like:	  
		
	[ [ A, B ],
	  [ C, B ],
	  [ D, E ] ]
	--------------
	{ B => [A, C],
	  E => [D],
	}

Idea: Analyze structure of input.  Analyze structure of output and
figure out an experssion to deposit each kind of output item.  Iterate
over input items.  Collect all input items into variables.  Deposit
items into output in appropriate places.

For an input array, tag the items with index numbers.
See where the indices go in the output.  Try to discern a pattern.
The above example:

Try #1:
A: 1
B: 2
C: 1
B: 2  -- consistent with B above
D: 1
E: 2

Output:  2 => [1, 1]
	 2 => [1]

OK---2s are keys, 1s are array elements.

A different try fails:
A: 1
B: 1
C: 2
B: 2 -- inconsistent, give up on this.

Now consider:
		
	[ [ A, B ],
	  [ C, B ],
	  [ D, E ] ]
	--------------
	{ A => B,
	  C => B,
	  D => E,
	}

A,C,D get 1; B,E get 2.  this works again.
1s are keys, 2s are values.

I need a way of describing an element of a nested data structure as a
simple descriptor so that I can figure out the mappings between
descriptors.  For arrays and nested arrays, it's pretty easy: Use the
sequence of numeric indices.  What about hashes?  Just K/V?
Or does V need to be qualified with the key perhaps?

Example above:

IN:	A:11  B:12 22 C:21 D:31 E:32
OUT:    A:K   B:V     C:K  D:K  E:V

Now try to find a mapping from the top set of labels to the bottom.
x1  => K, x2 => V works.

Problem with this:

	[ [ A, B ],
	  [ B, C ],
	]
	------------
	{ A => B,
	  B => C,
	}

is unresolvable.  Still, maybe  this works well enough in most common
cases. 

Let's consider:

	[[ A , AAA , red     , pink   , Aaa], 
	 [ B , BBB , yellow  , tawny  , Bbb],
	 [ A , AAA , green   , nude   , Aaa],
	 [ B , BBB , blue    , violet , Bbb],
	 [ C , CCC , black   , nude   , Ccc],
	]
        -------------------------------------------------------------
	{ A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
	  B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
	  C => [ CCC, [ [black, nude] ], CCC]
	}

A:	00,20 => K
AAA:	01,21 => V0
red:	02    => V100
pink:	03    => V101
Aaa:    04    => V2
B:      10,30 => K
C:      40    => K
etc.

Conclusion:  x0 => K; x1 => V0; x2 => V100; x3 => V101; x4 => V2

How to reverse?

Simpler reverse example:

	{ A => [ B, C ],
	  E => [ D ],
	}
	---------------------
	[ [ A, B ],
          [ A, C ],
	  [ E, D ],
	]

A: K	=> 00, 10
B: V0   => 01
C: V1   => 11
D: V0   => 21
E: K    => 20

Conclusion: K => x0; V => x1
This isn't enough information.  Really, V => k1, where k is whatever
the key was!

What if V items have the associated key too?

A: K	=> 00, 10
B: V{A}0=> 01
C: V{A}1=> 11
D: V{E}0=> 21
E: K    => 20

Now there's enough information to realize that B amd C stay with the
A, if we're smart enough to figure out how to use it.

----------------------------------------------------------------

20010728

Sent to Nyk Cowham <nyk@forumone.com>

20010824

Sent to <timur.shtatland@lbri.lionbioscience.com>

----------------------------------------------------------------

20011028

Here's a great example.  The output from HTML::LinkExtor is a list
like

        ([img, src, URL1, losrc, URL2],
         [a, href, URL3],
         ...
        )

we want to transform this into

        (URL1 => undef, 
         URL2 => undef, 
         URL3 => undef, 
         ...
        )