Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

So I have a simple script that rolls through a text file and loads anything it recognizes into a database like so:
if (/^\s*(\S+)\s+(\S+)\s+(.+\S)\s*$/) { $dbh->do('replace into reference_code values (?,?,?)',undef,$1,$ +2,$3) or die $dbh->errstr ;
The die may seem excessive, but I didn't expect any problems. Turns out I was wrong. The second and third column are marked as 'not null' in the database. This code died on the first line of the text file, claiming that the third column was null. I assumed I screwed up the regex, so added 1,2, and 3 to the die just to verify like so:
if (/^\s*(\S+)\s+(\S+)\s+(.+\S)\s*$/) { $dbh->do('replace into reference_code values (?,?,?)',undef,$1,$ +2,$3) or die "[$1] [$2] [$3] [". $dbh->errstr ."]";
and while it died with the same error, all three columns had a fair amount of data. Thinking something may be going wrong with the error string, I decided to print 1,2,and 3 just before each execution, just to make sure it was trying to insert what I thought it was:
if (/^\s*(\S+)\s+(\S+)\s+(.+\S)\s*$/) { print "[$1] [$2] [$3]\n"; $dbh->do('replace into reference_code values (?,?,?)',undef,$1,$ +2,$3) or die "[$1] [$2] [$3] [". $dbh->errstr ."]";
This ran to completion without error. Wtf? If I comment out the print, it once again errors right off the bat saying the third column is null, and if I uncomment it again, it runs fine. ??????

Replies are listed 'Best First'.
Re: Bitten by the lazy execution bug?
by mwah (Hermit) on Dec 08, 2007 at 01:06 UTC
    Wtf? If I comment out the print, it once again errors right off the bat saying the third column is null, and if I uncomment it again, it runs fine?

    the $1..$3 aren't localized on entering a sub, therefore any sub that uses a regular expression somewhere *might* invalidate the references to the special regex capture variables, like:

    ... $_ = 'the small print'; if( /^\s*(\S+)\s+(\S+)\s+(.+\S)\s*$/ ) { sub1( 'replace into reference_code values (?,?,?)', $1, $2, $3 ) } sub sub1 { my $p0 = shift; # try to comment this line out and get your lesson print "looks like a statement somewhere\n" if $p0 =~ /\w+/; my ($p1, $p2, $p3) = @_; print "statement parameters: $p1, $p2, $p3\n"; } ...

    Are you sure your program really did run without errors - only by inserting the "print" line?

    BTW, interpolation of $1..$3 in the sub call (as has been suggested) also does the trick (by instantiating anonymous copies of $1..$3):

    ... sub1( 'replace into reference_code values (?,?,?)', "$1", "$2", "$3" +) ...

    Regards

    mwa

      That is bizarre. I would have thought the values would have been passed in "by value".

        To quote from perldoc perlsub:

        The array @_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken.

        Thus the behavior of this code:

        #!/usr/bin/perl -l my @a = 1..3; sub inc { $_++ for @_; } inc(@a); print for @a; inc('hello'); __DATA__ 2 3 4 Modification of a read-only value attempted at - line 6.
Re: Bitten by the lazy execution bug?
by samtregar (Abbot) on Dec 07, 2007 at 23:03 UTC
    Does assigning $1, $2 and $3 to variables also fix the problem? I've had enough problems with the regex match vars that I always copy them to variables immediately after the regex.

    Also, what version of Perl is this? I remember having trouble with this kind of thing in v5.8.0 especially. You might try upgrading if you're not running the latest and greatest.

    -sam

      I appear to be running v5.8.4 built for i386-linux-thread-multi
Re: Bitten by the lazy execution bug?
by Jenda (Abbot) on Dec 08, 2007 at 01:39 UTC

    Are you sure the table only has the three columns you expect? In the right order? And will always have? Really?

    I would not recomend using the "INSERT INTO TableName VALUES (...)" (or REPLACE INTO) form, specify the columns explicitely! You can never know when will someone reorder the columns or add some more and you end up with an error (if lucky) or invalid data.

      Actually, this is one of the steps I took ( I thought the columns may have been getting swaped around ), so I actually have been using the "explicit column name" version of insert/replace, but that also had no affect on the execution of the code. The only thing that seems to alter whether it runs or not is using the $3 variable prior to the do call.
Re: Bitten by the lazy execution bug?
by parv (Parson) on Dec 07, 2007 at 23:45 UTC
    Shot in the dark: when you pass captured variables, try quoting them: $dbh->do( ... , "$1" , "$2" , "$3" ).
      That worked. Also, changing the code to this works:
      if (/^\s*(\S+)\s+(\S+)\s+(.+\S)\s*$/) { my $x = $3; $dbh->do('replace into reference_code values (?,?,?)',undef,$1,$ +2,$3) or die "[$1] [$2] [$3] [". $dbh->errstr ."]";
      How freaky is that? Notice 1,2,and 3 in the do call are unchanged. Take out the "my" line, instant error (and the db is unchanged). Leave it in, the program runs and the table is updated accordingly.
Re: Bitten by the lazy execution bug?
by Gangabass (Vicar) on Dec 08, 2007 at 05:01 UTC

    Can you show your replace_code table CREATE statement?

    Also did you try to explicit define colunms? Like so:

    REPLACE INTO reference_code (col_name_1, col_name_2, col_name_3) VALUES (?, ?, ?)
      I'm glad you asked that, because it forced me to take a hard look at the full create statement (as given by show create table reference_code (Did I mention I'm using a mysql db?)):
      CREATE TABLE `reference_code` ( `processor` enum('PTP','GNC') default NULL, `code` varchar(255) NOT NULL default '', `explanation` text NOT NULL )
      (minus indexes and other miscellanea) It's interesting that the two columns that have defaults specified don't need their respective variables ( $1 and $2) messed with. I tried recreating the table with a default for the explanation column, but apparently blob's can't have defaults as the default option was silently dropped. I also checked to see if, after a successful run of the script (using the  my $x=$3; method), any "processor" values were null or any "code" values were '', and they weren't.
Re: Bitten by the lazy execution bug?
by robin (Chaplain) on Dec 08, 2007 at 22:07 UTC
    What you're seeing is undoubtedly a bug. It obviously should not make any difference whether the variable has been used in a string.

    It's quite easy to introduce bugs of this sort when writing XS code, if you're not careful. Magical scalars (such as $1) sometimes need special treatment. I would therefore guess that this is a bug in the DBI driver you're using.

    Out of curiosity, I investigated this guess a little. You didn't say which driver you're using, but you mentioned that you're using MySQL, so I had a look at the DBD::mysql module on CPAN. In the ChangeLog I see the entry:

     * Make sure to handle "magical" values in a couple of places. (Bug #20104)

    listed under the changes for version 4.001. So my guess would be that you're using an older version, and that you will find this bug fixed in the latest version of the driver module.

Re: Bitten by the lazy execution bug?
by dragonchild (Archbishop) on Dec 09, 2007 at 03:19 UTC
    if ( my @vals = /<regex>/ ) { $dbh->do( $sql, undef, @vals ) or die " ... "; }

    That's much safer because it avoids all the crap about magic vars.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Bitten by the lazy execution bug?
by antirice (Priest) on Dec 09, 2007 at 07:03 UTC

    Apparently $dbh->do(...) uses a regex somewhere prior to actually inserting your data. Since $1, $2, and $3 are passed by reference, trying to match anything with the regex engine would reset the values. In your fix, since you're interpolating the values, you get what you originally wanted.