johngg has asked for the wisdom of the Perl Monks concerning the following question:

I was looking at this node before Christmas and decided to benchmark the various solutions. However, I struggled to get pat_mc's algorithm to behave in a subroutine as it would produce a result when first invoked but throw a "Can't use an undefined value as an ARRAY reference at ..." error on being invoked a second time. However, I can't see an array reference anywhere in the code.

I had changed the code slightly so that the routine would work on a copy of the benchmark data so that it would be preserved rather than consumed. I used a package rather than lexical array to avoid "Variable ... will not stay shared ..." warnings and, if it ran, null results for the second and subsequent invocations. Here is a cut-down version of the benchmark code that demonstrates the problem, using pat_mc's and almut's routines to show the results

use strict; use warnings; my @words = qw{ cooling rooting hooting looking doormat cooking cookies noodles }; print qq{pat_mc : @{ [ pat_mc() ] }\n}; print qq{almut : @{ [ almut() ] }\n}; print qq{pat_mc : @{ [ pat_mc() ] }\n}; print qq{almut : @{ [ almut() ] }\n}; sub almut { my $w1 = $words[0]; my $and = "\xff" x length($w1); my $or = "\0" x length($w1); for my $w (@words) { $and &= $w; $or |= $w; } my $xor = $and ^ $or; $xor =~ tr/\0/\xff/c; my $mask = ~$xor; my $common = $w1 & $mask; $common =~ tr/\0/-/; return $common; } sub pat_mc { our @common_letters; our @wordsCopy = @words; my $reference = shift @wordsCopy; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); })/gx; return join '', map { $common_letters[ $_ ] || '-' } 0 .. length( $reference ) - 1; }

The output.

pat_mc : -oo---- almut : -oo---- Can't use an undefined value as an ARRAY reference at ./spw731537C lin +e 48.

I returned to the problem today and added a couple of print statements to the errant routine to try to confirm where it was going wrong. Bizarrely, the routine then started working as expected. It seems that adding a statement between the two package array declarations stops the error occuring.

... sub pat_mc { our @common_letters; my $dummy = 0; our @wordsCopy = @words; my $reference = shift @wordsCopy; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); })/gx; return join '', map { $common_letters[ $_ ] || '-' } 0 .. length( $reference ) - 1; }

The output again.

pat_mc : -oo---- almut : -oo---- pat_mc : -oo---- almut : -oo----

I can't see what could be causing this. The output of perl -MO=Deparse -e '...' looks identical other than the my $dummy = 0; statement being where you'd expect. I'm hoping that someone will be able to throw some light on what is going on.

Cheers,

JohnGG

P.S. The benchmarks results if anyone's interested, almut's method being the fastest by a considerable margin. oko1 and I shared the wooden spoon :-(

almut : -oo---- johngg : -oo---- oko1 : -oo---- pat_mc : -oo---- Rate oko1 johngg pat_mc almut oko1 223/s -- -13% -91% -98% johngg 256/s 15% -- -89% -97% pat_mc 2416/s 984% 843% -- -75% almut 9772/s 4284% 3714% 304% --

P.P.S. Just as I was about to post this I wondered whether initialising the @common_letters array rather than just declaring it would make a difference. Sure enough

... sub pat_mc { our @common_letters = (); our @wordsCopy = @words; my $reference = shift @wordsCopy; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); })/gx; return join '', map { $common_letters[ $_ ] || '-' } 0 .. length( $reference ) - 1; }

produces

pat_mc : -oo---- almut : -oo---- pat_mc : -oo---- almut : -oo----

Why would that be?

Replies are listed 'Best First'.
Re: Strange "undefined value as an ARRAY reference" error
by ikegami (Patriarch) on Jan 08, 2009 at 02:00 UTC
    • our @common_letters = (); our @wordsCopy = @words;

      should be

      local our @common_letters; local our @wordsCopy = @words;
    • Calling subroutines is expensive. I'd try replacing

      local our @common_letters; local our @wordsCopy = @words; my $reference = shift @wordsCopy; $reference =~ /(.)(?{ ... })/gx;

      with

      my @common_letters; my @wordsCopy = @words; my $reference = shift @wordsCopy; while ($reference =~ /(.)/g) { ... }

      There's also

      for my $c($reference =~ /./g) # uses $c instead of $1

      and

      for my $c (split(//, $reference)) # uses $c instead of $1
    • The "() =" is useless.

    • What's with the use of $bolean? Aside from the fact that it should be spelled "boolean",

      my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean );

      can be written as

      for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $common_letters[ $position ] = $letter; last } }
    • Seems to me

      my @wordsCopy = @words; my $reference = shift @wordsCopy;

      is a roundabout way of doing

      my ($reference, @wordsCopy) = @words;
    • It might be faster to split the words into arrays at the top of the function then using array lookups instead of substr.

      for ( @splitWord ) { if ( $_->[$position] ne $letter ) { $bolean = 0; last } }

      I'd try

      sub ikegami1 { my ($ref, @wordsCopy) = @words; while, ($ref =~ /(.)/g) { my $letter = $1; my $pos = $-[0]; for ( @wordsCopy ) { if ( substr( $_, $pos, 1 ) ne $letter ) { substr( $ref, $pos, 1, '-' ); last; } } } return $ref; } sub ikegami2 { my ($ref, @wordsCopy) = @words; for my $pos (0..length($ref)) { my $letter = substr( $ref, $pos, 1 ); for ( @wordsCopy ) { if ( substr( $_, $pos, 1 ) ne $letter ) { substr( $ref, $pos, 1, '-' ); last; } } } return $rv; } sub ikegami3 { my @wordsCopy; push @wordsCopy, [ /./g ] for @words; my $ref = pop(@wordsCopy); for my $pos (0..$#$ref) { my $letter = $ref->[$pos]; for ( @wordsCopy ) { if ( $_->[$pos] ne $letter ) { $ref->[$pos] = '-'; last; } } } return join '', @$ref; }

      These are just attempted improvements on your method. It's not going to beat the bit method.

      A little humor

      What's with the use of $bolean? Aside from the fact that it should be spelled "boolean"

      I think it is pronounced b-oh-l-ee-n (rhymes with Go Lean). It is a breakfast cereal consumed in Hazard County.

      and yes, it is very little humor ;-)

      --MidLifeXis

Re: Strange "undefined value as an ARRAY reference" error
by jdporter (Paladin) on Jan 08, 2009 at 01:10 UTC
    I haven't investigated it fully, but...
    I used a package rather than lexical array to avoid "Variable ... will not stay shared ..." warnings

    You shouldn't have this problem if your code is as you posted. Please try it using my. Results will be different, and examining the differences should be illuminating.

    I don't quite understand why it is so, but there seems to be a problem with the following syntax when the sub returns undef:

    print qq{pat_mc : @{ [ pat_mc() ] }\n};
    Lastly: You should definitely be checking $reference for a valid value before subjecting it to any further operations. If you have pat_mc throw an explicit error (die) when $reference is undef, you'll probably get a much more useful diagnostic than letting it fail inside a bizarro regex code pattern. :-)

    Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.
      The OP is right. If you use a variable in a Perl regexp block, and that variable is defined outside the regexp, it's gotta be a package variable. Regexps acts as captures when they are compiled.
      sub foo { my ($x) = @_; '' =~ /(?{ print "$x\n" })/; } foo('a'); foo('b');
      Variable "$x" will not stay shared at (re_eval 1) line 2. a a

      The warning was added in 5.10, but the problem has always existed.

      However, the OP should have localized the variables. Bonus, this initializes @common_letters to ().

      local our @common_letters; local our @wordsCopy = @words;
Re: Strange "undefined value as an ARRAY reference" error
by Anonymous Monk on Jan 08, 2009 at 00:07 UTC
    What line number?