Greetings Monks. Yet again I've been bitten by reference numification. Witness:

# commit in 1000-row chunks if ($row and ($row % $COMMIT_CHUNK_SIZE == 0)) { $dbh->commit; }

That code is supposed to commit a transaction every $COMMIT_CHUNK_SIZE rows. It would work great if $row was the row count. Unfortunately for me, $row was actually an ARRAY ref, which meant that the commits never happened (or happened on every row, if the pointer in $row happened to be an even multiple of $COMMIT_CHUNK_SIZE). Even more unfortunately, this usually didn't matter because MySQL was able to handle quite a lot of inserts in a single transaction. That is, until it couldn't, and blew up with a buffer size error in the middle of an 8-hour job.

Here's what I would have liked to see when that code ran:

   Warning: numifcation of a reference at line 100.

Intentional numification of a reference is so rare, and unintential numification is so common that I think this makes a heck of a lot of sense as a warning. If my head of steam lasts long enough I may work up a patch.

And may I also say, AAAAAAAAAAAAAAAAARRRGGGRHHH.

Thank you.

-sam

Replies are listed 'Best First'.
Re: The trap of reference numification
by xdg (Monsignor) on Nov 11, 2005 at 20:54 UTC

    Well, quick and dirty, and at the cost of a performance hit everywhere, you could overload numification etc. in UNIVERSAL to carp about it:

    WarnOver.pm

    package WarnOver; use strict; use warnings; package UNIVERSAL; use Scalar::Util 'refaddr'; use Carp; use overload q{0+} => sub { carp "numifying reference " . overload::StrVal($_[0]); refaddr $_[0]; }, q{""} => sub { carp "stringifying reference " . overload::StrVal($_[0]); overload::StrVal($_[0]); }, fallback => 1; ; 1;

    test_warnover.pl

    use strict; use warnings; use lib "."; use WarnOver; my $foo = bless {}, "Foo"; $\="\n"; print $foo % 2; print "$foo";

    Result

    numifying reference Foo=HASH(0x225118) at checkwarn.pl line 10 0 stringifying reference Foo=HASH(0x225118) at checkwarn.pl line 11 Foo=HASH(0x225118)

    Update: This was just a quick thought exercise in evil overloading. It of course breaks horribly for object comparisions, as discussed above.

    package Foo; use WarnOver; $foo = bless {}; $bar = bless {}; print "equal" if $foo == $bar; # two warnings!

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: The trap of reference numification
by QM (Parson) on Nov 12, 2005 at 04:50 UTC
    I agree. No one should ever accidentally do math on a ref, without getting a warning (maybe even an error).

    If the Perl Gods want to do this for some reason, give them a mechanism to turn it off:

    no warnings refnum;
    I would even go so far as to say comparison should be strings only -- what better way to reinforce the notion that math is not useful on refs (unless you're doing something naughty under the hood).

    And if you're still convinced you need to do math on refs, you can grab the digits out with a regex. At least then it will be painfully obvious for the next Perler.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      I strongly disagree.

      sub is_same_ref_str { $_[0] eq $_[1] } sub is_same_ref_num { $_[0] == $_[1] } $_ = \$_; print +( "not ", "" )[ is_same_ref_str( $_, "$_" ) ], "the same refere +nce\n"; print +( "not ", "" )[ is_same_ref_num( $_, "$_" ) ], "the same refere +nce\n"; __END__ the same reference not the same reference

      I use strict because I like not having to worry about stringified references. I compare them with == for the same reason; that protection is arguably weaker, but it’s still better than using eq.

      Of course, if you want to be really conscientous, you need to lug Scalar::Util into every script.

      Perl should have had eqref and neref operators to compare with reference semantics, the same way it has eq and ne to compare with string semantics vs == and != to compare with numeric semantics.

      Makeshifts last the longest.

Re: The trap of reference numification
by dragonchild (Archbishop) on Nov 11, 2005 at 19:09 UTC
    It's a pity that overloading creates such a large overhead. I can see great benefit in numifying $row to find the $row_count. Contextual::Return provides this, but at a significant cost to performance.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      It isn't that $row is bad in some way. It really was supposed to be an ARRAY ref. I just used the wrong variable. The one I wanted was $line_num.

      Also, this is hardly the first time I've been bit by numified references. I say we should solve the problem, not swat at the symptoms.

      -sam

        Sounds like you needed Hungarian notation. Not the evil Systems notation, but the good Apps notation. Refer to http://en.wikipedia.org/wiki/Hungarian_notation for more info on the difference. FWIW, the inventor invented the Apps version, but it was mutated into the Systems version due to a poor choice of words in the original whitepaper.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: The trap of reference numification
by demerphq (Chancellor) on Nov 12, 2005 at 11:22 UTC

    I partially agree. Any math operation (excluding ==) on a reference that doesnt have the references nummified by 0+$ref and doesnt have the operators in question overloaded in the package the ref is blessed into should warn. Anything else should not, as it allows the author to signal "yes I am doing this on purpose" and as 0+$ref is such a common idiom (albeit a moderately dangerous one due to overloading). Likewise you don't want warnings flying everywhere when you are using bigint objects with overloading.

    ---
    $world=~s/war/peace/g

      Any math operation (excluding ==) on a reference that doesnt have the references nummified by 0+$ref and doesnt have the operators in question overloaded in the package the ref is blessed into should warn.
      I disagree. That's contrary to Perl's automatic casting for programmer convenience. What's next, concatenation of numbers throws a warning, unless the number is surrounded with quotes? Use of an array as a number throws a warning, unless it's explicitely preceeded with '0+'? Using a string as first argument of split throws a warning, because it isn't surrounded by slashes?

      If you want a language that will throw warnings if casting happens implicitely, use C.

      I don't think that the attitude of "oh, I made a programming boo-boo, let's add a warning that prevents me from making the boo-boo again" is very productive. It will only annoy people who legitimate use this feature (that has been around for over a decade!) - and it will make people not use use warnings;. Or to not use perl.

      Warnings are good. But a warning that triggers at the wrong moment is bad. Very bad. Warnings should never get in the way.

      Perl --((8:>*

        Well, Perl has a tendency to take the attitude that it should warn when you are most likely doing something stupid, and to not to warn when you probably aren't. Thus you see things like "0 but true" NOT warning

        C:\Temp>perl -we"print 0+'0 but true'" 0 C:\Temp>perl -we"print 0+'0 and true'" Argument "0 and true" isn't numeric in addition (+) at -e line 1. 0

        Since I can't think of any remotely sane reason why one would do a string or numeric comparison other than '==' or 'eq' on a ref (ok, I lie, I can think of one, but its pretty sucky, that being that you can sort references into an arbitrary but repeatedable order using reference comparison), therefore it doesnt seem to me to be unreasonable to warn in these situations. Whereas 0+ implies the decision was intentional and not accidental, and overloading implies that the operation is specifically allowed, and '==' is the normal way for doing equivelency tests on refs, so you dont want to warn in those situations.

        What would this really mean for the programmer? Well it means that when they want to violate the encapsulation that references provide and use an arbitrary value (the memory location of the ref) for something it probably shouldn't be used for they need to signal to the interpreter that they would like to blow their foot off, by using the 0+ disambiguation. IMO this would probably catch a few bugs, and maybe break some code that uses references as hashkeys without using the 0+ disambiguation, something that could itself be special cased, or probably better just be worked around with a 0+.

        Anyway, I doubt any of this will ever happen, but I thought I would express what I consider to be the way it might make sense.

        ---
        $world=~s/war/peace/g

        $ perl -wle'print "99 bottles" + 1'
        Argument "99 bottles" isn't numeric in addition (+) at -e line 1.
        100
        

        Makeshifts last the longest.

Re: The trap of reference numification
by itub (Priest) on Nov 11, 2005 at 19:12 UTC
    I don't know if it's that rare; I've seen it used quite a bit to generate an object ID of some sort (stringification is also used for that). If you add that as a warning, many currently warning-free programs would start spewing lots of warnings, which might be annoying... But I agree that in principle such a warning would be useful, and I imagine the same drawbacks apply whenever a new warning is added.
      In my experience object IDs almost always use the stringified reference as a hash key (the usual way for inside-out objects.) Or Scalar::Util::refaddr is used (as in Class::Std and PBP.)

      refaddr returns the numified reference, but Scalar::Util already turns warnings off when munging that. (And SU does it in a rather bizarre way with a regex and hex instead of using direct numification for reasons that remain mysterious to me.)

      I don't think there are many common uses of numified references outside those two contexts.

      Update: Actually it looks like the issue with refaddr is fixed in recent versions. It uses int now.

        And SU does it in a rather bizarre way with a regex and hex instead of using direct numification for reasons that remain mysterious to me.

        The answer to that is simple: overloading. Up until later perls (and even in them im not sure) there was no way to bypass overloaded nummification. Thus 0+$ref is inherently dangerous on a blessed ref. However there is and has always been a way to bypass overloaded stringification which conveniently contains the reference address in hex. Thus the only generally safe pureperl way to get the address of a ref is via this technique.

        Having said that SU doesn't use the pure perl code except on older perl builds on OS'es/Machines that dont have XS installed. The XS code it uses for refaddr() is much more efficient and bypasses all of these problems.

        ---
        $world=~s/war/peace/g

        Comparisons between objects. Depending on my mood, I'll use eq or == interchangably. Before we go ahead and do this, there needs to be considerable discussion. This will end up being a bigger change than expected.

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: The trap of reference numification
by graff (Chancellor) on Nov 13, 2005 at 00:15 UTC
    Um, slightly off-topic here, but when you said:
    ... this usually didn't matter because MySQL was able to handle quite a lot of inserts in a single transaction. That is, until it couldn't, and blew up with a buffer size error in the middle of an 8-hour job.

    I had to wonder: are you sure you want to be doing that many inserts via DBI? If, instead, you could save the insertion data to a file, and when the file is complete and closed, you execute a single "LOAD DATA INFILE" statement, not only would you vastly reduce the unpleasant likelihood of bad things happening in the middle of database modifications, but also a DBI process that currently takes 8 hours might end up taking noticeably less time.

    (Of course, I could envision situations where other database things need to be done that depend on, and must be interleaved with, a sequence of inserts. But in a case like that, I'd still hope to find a way to refactor the task so that inserts can be done in bulk with the db-server's native data-import tools. And then there might be the problem of permissions, if the server is running remotely and being managed by others, in which case some diplomacy might be worth trying...)

      I had to wonder: are you sure you want to be doing that many inserts via DBI?

      Yes, I am. But since you seem to care, you'll be happy to know that I reduced the runtime to less than an hour through a combination of multi-value inserts and factoring out some unnecessary Class::DBI usage. I'm sure it could go faster still but I think I'm at a point where it wouldn't be worth the development time.

      -sam