Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Pop quiz: find the bug

by Ovid (Cardinal)
on Jul 03, 2002 at 23:04 UTC ( [id://179346] : perlmeditation . print w/replies, xml ) Need Help??

I'm posting this because it's a demonstration of an interesting feature in Perl that can cause strange bugs. I worked and worked at finding out what was wrong with the following snippet. It's a subtle bug and I'm not too embarrassed at missing it. It's a mistake I hopefully won't repeat. Can you find it? If you find it too quickly, don't spoil it for the others, though :)

while ( my $data = $t_sth->fetchrow_arrayref ) { my ( $amt, $id ) = @$data; $amt /= PRECISION; SWITCH: { $id == $CASH && ($tcash += $amt) && last SWITCH; $id == $ACCOUNT && ($taccount += $amt) && last SWITCH; $id == $CHECK && ($tcheck += $amt) && last SWITCH; $id == $GIFT && ($tgift += $amt) && last SWITCH; $id == $VOUCHER && ($tvoucher += $amt) && last SWITCH; $id == $CC_MAN_AUTH && ($tcc_man_auth += $amt) && last SWITCH; ($tcredit += $amt); } }

Hint: the bug is in the switch statement. All data entering the switch statement is correct. Further, $id and the values it compares with are both integers (i.e., no floating point comparisons).

chromatic was shown this and he rewrote it with this. Bonus points if you see how it works.

my %totals; @totals{$CASH, $ACCOUNT, $CHECK, $GIFT, $VOUCHER, $CC_MAN_AUTH} = \($tcash, $taccount, $tcheck, $tgift, $tvoucher, $tcc_man_auth); while ( my $data = $t_sth->fetchrow_arrayref ) { my ( $amt, $id ) = @$data; $amt /= PRECISION; if (my $totalvar = $totals{ $id }) { $$totalvar += $amt; } else { $tcredit += $amt; } }

His is a better solution than mine in that it works, but discovering that taking a reference is a distributive action (the assignment to the hash slice) was surprising! He assures me that this is a documented feature.

Update: Congrats to dws for being the first person to accurately identify and describe the bug.


Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Pop quiz: find the bug
by VSarkiss (Monsignor) on Jul 03, 2002 at 23:19 UTC

    He assures me that this is a documented feature.
    Indeed it is, right at the top of perlref:
    Note that taking a reference to an enumerated list is not the same as using square brackets--instead it's the same as creating a list of references!
    @list = (\$a, \@b, \%c); @list = \($a, @b, %c); # same thing!
    As a special case, \(@foo) returns a list of references to the contents of @foo, not a reference to @foo itself. Likewise for %foo.

    I realized I didn't answer the question....

    I think:
    B::Deparse to the rescue: your code compiles to:

    SWITCH: { last SWITCH if $id == $CASH and $tcash += $amt; last SWITCH if $id == $ACCOUNT and $taccount += $amt; last SWITCH if $id == $CHECK and $tcheck += $amt; last SWITCH if $id == $GIFT and $tgift += $amt; last SWITCH if $id == $VOUCHER and $tvoucher += $amt; last SWITCH if $id == $CC_MAN_AUTH and $tcc_man_auth += $amt; $tcredit += $amt; }
    So you're continuing to the next case if $amt and the value being increased are both zero. Though that shouldn't cause any harm....

    Update 2
    Ack, I think I got it! Thanks Aristotle!
    $amt can be negative. It can still bring one of the others to zero, then get decremented from the following branch.

      Hehe, oh yes it does cause harm - exactly that is the subtle bug Ovid was talking about.

      Makeshifts last the longest.

Re: Pop quiz: find the bug
by dws (Chancellor) on Jul 03, 2002 at 23:13 UTC
    My suspicion is...
    If you get a credit on the account, such that either $tcount, $tcash, or one of the others reach zero, you won't trigger the last, and will fall through to an inadvertant (and incorrect) change to $tcredit.
    and the rewritten code doesn't suffer the same fate because
    it doesn't have the possibility of falling through into a $tcredit update.

    That's white type above. Select it to see it.

Re: Pop quiz: find the bug
by Anonymous Monk on Jul 03, 2002 at 23:20 UTC
    The bug you are talking about would have been avoided using do blocks and not risking short-circuiting at the assignment expressions
    SWITCH: { $id == $CASH && do{$tcash += $amt; last SWITCH}; ... }
      How about:
      SWITCH: { $id == $CASH and $tcash += $amt, last SWITCH; ... }


Re: Pop quiz: find the bug
by Aristotle (Chancellor) on Jul 03, 2002 at 23:23 UTC

    Update: black on black is a good one. So here comes:

    An assignment returns the variable being assigned to as an lvalue, which the following && evaluates to its value. Therefor, ($something += $amt) will return 0 if that is $something's new value, causing the following && to fall through and miss the last. As a result, at least one too many variables get updated within the same switch statement - potentially even all of them.

    As far as the distributive referentiation is concerned, check the Camel book, chapter 8, under "Creating references" in subsection "More tricks with references". (I hope I have the names right - what I have is the German translation.) It's also documented in perlref, under "Making References", at the end of point 2.

    Makeshifts last the longest.

      That's Chapter 8 (p. 246) of the English second edition Camel book, for those of you with obsolete references Perl 5 as a target platform.

      The hell with paco, vote for Erudil!

Re: Pop quiz: find the bug
by Zaxo (Archbishop) on Jul 03, 2002 at 23:24 UTC
    $id == $FOO && ($tfoo += $amt) && last SWITCH; depends on the final value of $tfoo being non-zero. If zero, the switct falls through to the default and awards double credit :)
    To repair, have 'last' execute unconditionally with the comma operator:
    $id == $FOO && ($tfoo += $amt), last SWITCH;

    After Compline,

Re: Pop quiz: find the bug
by jjohn (Beadle) on Jul 04, 2002 at 14:06 UTC

    I'm posting this because it's a demonstration of an interesting feature in Perl that can cause strange bugs. ... It's a subtle bug and I'm not too embarrassed at missing it.

    The problem isn't that code contains bugs -- because it is the nature of humans to make mistakes. The cloud of shame that surrounds the occurrence of bugs in code is counterproductive. Some have the attitude that a coder is inferior for having made a mistake. This attitude is toxic to creating better software. Finding bugs should be a celebrated event. After all, you've just made your code more correct. So, I begin this long missive with: "For those about to debug, we salute you!"

    In this case, I think there's a semantic bug in the switch statement. As others have pointed out each "case" statement depends on all three conditions being true. In some rare cases where you want a particular case to execute, it won't because the middle expression will evaluate to a false value. The implication made in the post is that this is a Perl mis-feature. That isn't entirely fair. After all, Perl is faithfully obeying the rules of boolean logic. It would be more surprising if it didn't.

    Mark-Jason Dominus has given a talk called "Red Flags" several times. Instead of giving "anti-patterns" per se, he exhibits code that can be rewritten in a safer and often faster way. There appear to be at least two Red Flags in this bit of code.

    The first Red Flag is one that MJD calls "Families of Variables," which can be defined as the use of several scalars to hold related values. In many cases, it is better to store these values in a collection type, such as an array or hash, rather than complicate code with many rarely used scalars. Chromatic's solution addresses this Red Flag well. By storing $tcash, $taccount, $tcheck, $tgift, $tvoucher, $tcc_man_auth in one hash called %totals, the code becomes cleaner and these random scalars are coraled into one easy-to-handle place.

    I have noticed that newbies "phear" Perl hashes and I'm not entirely sure why. Hashes, of all data structures, operate most like the human mind -- using word associations to remember things. Hashes, regexes and CPAN are the three most important reasons to use Perl over other languages.

    The other Red Flag is the use of boolean operators for control statements. As this code has just demonstrated, boolean logic can be trickier than it first appears. Of course, most of us can see what's going on here by sitting down with pencil and paper and drawing truth tables, but that's work. Perl is about getting done with work faster than your Java friends and drinking a whole pint before they arrive at the bar (well, that's the reason I use Perl). So, I submit that a complex use of boolean operators for control statements is a Red Flag. When one comes across such code, ask yourself whether the code would be clearer with if statements. You'll enjoy debugging if statements much more than a series of ampersands, double pipes and parentheses. Of course, sometimes you really, really need the complex boolean statements. Run far away from that kind of code.

    Uncle Joe's Programming Tip #3: Tricky code makes debugging hard.

    Perhaps I'm being dense, but I think Chromatic's solution is overkill. Let's look at the (shorten) line:

    if (my $totalvar = $totals{ $id }) { $$totalvar += $amt; }

    It appears that %totals is a hash that has $id values as the key and scalar references as values. That seems weird. I think the idea of this code is to see if there is an amount of $id type in the hash (even if that value is zero), and add the new $amt to it. Because references are always true, this code won't get bogged down in the problems of the first code example. I think there is an easier way to write without references:

    if( defined $total{$id} ){ $total{$id} += $amt; }else{ $total{tocredit} += $amt; }

    Again, I think it's good to keep the account totals together. Therefore, I would stuff even the 'tocredit' values into this hash. I assume that %totals will be populated with code that will look something like:

    my $sth = $dbh->prepare('select * from accounts where id = ?'); $sth->execute($current_account_id); my $hr = $sth->fetchrow_hashref; # a naive way my %totals = %{$hr};

    This assumes that the $id value presented in the original code is one of the SQL column names in the accounts table.

    Originally I used 'exists' rather than 'defined,' but that makes populating the %total hash trickier. In fact we only care if the value is not undef.

    In this case, a switch statement isn't needed at all. Perl 6 will have a much cleaner syntax for creating switches (given, when). Until then, try to think about using hashes instead of Perl 5's fishy switches.


      if (defined $total {$id}) { $total {$id} += $amt; } else { $total {tocredit} += $amt; }
      Nice article, and a nice piece of code. However, there's still something in this piece of code that can be factored out.
      $total {exists $total {$id} ? $id : "tocredit"} += $amt;
      Or, in a more ALGOLish style:
      (exists $total {$id} ? $total {$id} : $total {tocredit}) += $amt;
      Note also the use of exists instead of defined; when using exists, we avoid having to initialize all the entries in the hash.


        Note also the use of exists instead of defined; when using exists, we avoid having to initialize all the entries in the hash.

        Originally, I thought exists was the correct operator and indeed it would be given your assumption that only keys with values need to be inserted into the hash. Since I'm a simple and lazy guy, I thought that the values would all be retrieved from an SQL table. If so, the most naive way to handle row data is simply to retrieve the data as a hash. The method of populating the %total hash is pure speculation on my part and surely one could carefully create the hash so that only those keys with values would be inserted. In this case (your case), your solution is superior. Your code may be a little terse for some (although I find the second form of your solution titillating -- using the trigraph operator to select the lvalue is uncommon (because I've never used ALGOL)).

        Thank you for your kind words about my post, but be advised that you are only encouraging me to post more. ;-)

      Perhaps I'm being dense, but I think [c]hromatic's solution is overkill.

      If I were writing (or re-writing) the entire code snippet from scratch, I'd agree. The point of replacing the switch statement was twofold. First, I don't like switch statements. I find them generally unperlish. Second, I wanted to replace just that code, in a local refactoring. Since it was a bugfix, I didn't want to change a bunch of other variables.

      That left me with the question, "How do I update the values of these variables based on identifying keys, while maintaining the variable names?" It was shorter than reassigning the variables from the hash values.

        It was shorter than reassigning the variables from the hash values.

        Ah! You have lifted the coins from my eyes. I didn't quite understand why you were using scalar refs. I believe your code assumes that the hash would be initialized like:

        %totals = ( 'tcash' => \$tcash, ... );

        This is reasonable if the individual scalars must exist. It would be preferable to create hash that contains the appropriate values obtained directly from the canonical source, which I imagine is probably an SQL table. So perhaps I was over-thinking the solution by trying to imagine where those several scalars came from at all.

        Still, it isn't an unpleasant way to while away the hours. ;-)

Re: Pop quiz: find the bug
by Anonymous Monk on Jul 04, 2002 at 03:01 UTC
    I prefer this:
    my %total; while ( my $data = $t_sth->fetchrow_arrayref ) { my ( $amt, $id ) = @$data; $total{$id} += $amt / PRECISION; } $tcash += delete $total{$CASH} || 0; $taccount += delete $total{$ACCOUNT} || 0; $tcheck += delete $total{$CHECK} || 0; $tgift += delete $total{$GIFT} || 0; $tvoucher += delete $total{$VOUCHER} || 0; $tcc_man_auth += delete $total{$CC_MAN_AUTH} || 0; $tcredit += $_ foreach values %total;
    Slightly longer, but much more straightforward. (Although I did need the || 0 bits to make it quiet under warnings.) And, of course, all of the variables following a naming pattern suggests that you would be better off using a hash...

    BTW I hope you learned that using short-circuting conditionals as a control structure is a bad idea.