Subject:      Why it's stupid to `use a variable as a variable name'
From:         mjd@op.net (Mark-Jason Dominus)
Date:         1998/06/10
Message-ID:   <6lnb70$lct$1@monet.op.net>
Newsgroups:   comp.lang.perl.misc,comp.programming 

People show up in comp.lang.perl.misc all the time asking how to use the contents of a variable as the name of another variable. For example, they have $foo = 'snonk', and then they want to operate on the value of $snonk.

That's very easy to do in Perl, so they usually get some people to tell them to do it. And they usually get some people asking them why they didn't use a hash instead. Sometimes I'm one of the people who says to use a hash instead, and sometimes I'm one of the people who answers the question that was asked, even though I think they should be using a hash instead.

Anyway, a couple of weeks ago one of my clients called up with some program that was producing wrong reports. They needed it to be fixed by the following day. The program was going to read a database with records like these:

        this    red     something
        that    green   something else
        other   red     more
        this    blue    still more

and build a report of how many records had each value in each position.

It turned out that the clods who had written this program had done something like this:

        while (<RECORDS>) {
          chomp;
          @values = split /\t/, $_;
          foreach $v (@values) {
            $$v++;
          }
        }

        print <<EOM;
        Question 1:
        $this users said `this'.  $that users said `that'.

        Question 2:
        $red users said their favorite color was red.
        
        ... (and so on ) ...
        EOM

Of course, the actual code was much longer and much more obfuscated.

Anyway, to make a long story short, the problem turned out to be that there was a certain response, let's say foo, (actually, it was digoxin---go figure) which was a valid response for two totally unrelated questions, say #7a and #11. So anyone answering `foo' to question 7a would be counted as having answered foo to question 11 as well, and vice versa. At the end of the analysis, the $foo variable contained the sum of all the users who answered foo to either question 7a or to question 11. Then the reports used this sum in two places, and that's why the reports were inaccurate.

This shoddy logic was so pervasive in the program that I couldn't find an easy way to fix it. If the original programmers had used a series of hashes instead of stuffing everything into a bunch of global variables, it would never have happened, or at worst it would have been easy to fix. I ended up doing a major overhaul on the program to solve the problem. The main loop turned into something more like:

        while (<RECORDS>) {
          chomp;
          @values = split /\t/, $_;
          for ($i=1; $i <= $NUMQUESTIONS; $i++) {
            my $v = shift @values;
            $count[$i]{$v}++
          }
        }

Of course, the actual code was much longer and much more obfuscated, although it was neither as long nor as obfuscated as when I got to work on it.

I shudder to think what would have happened to this program if one of the responses had been named i or v or 3 or some such. One can even imagine that that happened to these clods once upon a time, and that their response was to change the colliding variable name instead of heeding the warning.

Anyway, deriving the name of the variable from an input value turned out to be a very stupid decision in this case, and one which cost my client a couple of thousand dollars.

When people come into comp.lang.perl.misc asking how to do something stupid, I'm never quite sure what to do. I can just answer the question as asked, figuring that it's not my problem to tell people that they're being stupid. That's in my self-interest, because it takes less time to answer the question that way, and because someone might someday pay me to clean up after their stupidity, as happened in this instance. But if I do that, people might jump on me for being a smart aleck, which has happened at times. (``Come on, help the poor guy out; if you know what he really needs why don't you just give it to him?'')

On the other hand, I could try to answer on a different level, present a better solution, and maybe slap a little education on `em. That's nice when it works, but if it doesn't it's really sad to see your hard work and good advice ignored. Also, people tend to jump on you for not answering the question. (``Who are you to be telling this guy what he should be doing? Just answer the question.'')

I guess there's room for both kinds of answer. Or maybe there isn't room for either kind.

Whatever. I seem to have gone off on a tangent. The real root of the problem code is: It's fragile. You're mingling unlike things when you do this. And if two of those unlike things happen to have the same name, they'll collide and you'll get the wrong answer. So you end up having a whole long list of names which you have to be careful not to reuse, and if you screw up, you get a very bizarre error. This is precisely the problem that namespaces were invented to solve, and that's just what a hash is: A portable namespace.

The main point of this article was to present a real example of a case where using a variable as a variable name was a really stupid thing to do. Since most of the people who post about that in comp.lang.perl.misc seem to be trying to do the same stupid thing in the same stupid way, I thought I'd mention it, and maybe raise the general awareness of this problem.


This series continues: Part 1 Part 2 Part 3


Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia

mjd-perl-misc@plover.com