I was looking at this node before Christmas and decided to benchmark the various solutions. However, I struggled to get pat_mc's algorithm to behave in a subroutine as it would produce a result when first invoked but throw a "Can't use an undefined value as an ARRAY reference at ..." error on being invoked a second time. However, I can't see an array reference anywhere in the code.
I had changed the code slightly so that the routine would work on a copy of the benchmark data so that it would be preserved rather than consumed. I used a package rather than lexical array to avoid "Variable ... will not stay shared ..." warnings and, if it ran, null results for the second and subsequent invocations. Here is a cut-down version of the benchmark code that demonstrates the problem, using pat_mc's and almut's routines to show the results
use strict; use warnings; my @words = qw{ cooling rooting hooting looking doormat cooking cookies noodles }; print qq{pat_mc : @{ [ pat_mc() ] }\n}; print qq{almut : @{ [ almut() ] }\n}; print qq{pat_mc : @{ [ pat_mc() ] }\n}; print qq{almut : @{ [ almut() ] }\n}; sub almut { my $w1 = $words[0]; my $and = "\xff" x length($w1); my $or = "\0" x length($w1); for my $w (@words) { $and &= $w; $or |= $w; } my $xor = $and ^ $or; $xor =~ tr/\0/\xff/c; my $mask = ~$xor; my $common = $w1 & $mask; $common =~ tr/\0/-/; return $common; } sub pat_mc { our @common_letters; our @wordsCopy = @words; my $reference = shift @wordsCopy; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); })/gx; return join '', map { $common_letters[ $_ ] || '-' } 0 .. length( $reference ) - 1; }
The output.
pat_mc : -oo---- almut : -oo---- Can't use an undefined value as an ARRAY reference at ./spw731537C lin +e 48.
I returned to the problem today and added a couple of print statements to the errant routine to try to confirm where it was going wrong. Bizarrely, the routine then started working as expected. It seems that adding a statement between the two package array declarations stops the error occuring.
... sub pat_mc { our @common_letters; my $dummy = 0; our @wordsCopy = @words; my $reference = shift @wordsCopy; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); })/gx; return join '', map { $common_letters[ $_ ] || '-' } 0 .. length( $reference ) - 1; }
The output again.
pat_mc : -oo---- almut : -oo---- pat_mc : -oo---- almut : -oo----
I can't see what could be causing this. The output of perl -MO=Deparse -e '...' looks identical other than the my $dummy = 0; statement being where you'd expect. I'm hoping that someone will be able to throw some light on what is going on.
Cheers,
JohnGG
P.S. The benchmarks results if anyone's interested, almut's method being the fastest by a considerable margin. oko1 and I shared the wooden spoon :-(
almut : -oo---- johngg : -oo---- oko1 : -oo---- pat_mc : -oo---- Rate oko1 johngg pat_mc almut oko1 223/s -- -13% -91% -98% johngg 256/s 15% -- -89% -97% pat_mc 2416/s 984% 843% -- -75% almut 9772/s 4284% 3714% 304% --
P.P.S. Just as I was about to post this I wondered whether initialising the @common_letters array rather than just declaring it would make a difference. Sure enough
... sub pat_mc { our @common_letters = (); our @wordsCopy = @words; my $reference = shift @wordsCopy; () = $reference =~ /(.)(?{ my $letter = $1; my $position = $-[0]; my $bolean = 1; for ( @wordsCopy ) { if ( substr( $_, $position, 1 ) ne $letter ) { $bolean = 0; last } } $common_letters[ $position ] = $letter if ( $bolean ); })/gx; return join '', map { $common_letters[ $_ ] || '-' } 0 .. length( $reference ) - 1; }
produces
pat_mc : -oo---- almut : -oo---- pat_mc : -oo---- almut : -oo----
Why would that be?
In reply to Strange "undefined value as an ARRAY reference" error by johngg
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |