Need a replacement method for older version of perl

vivomancer has asked for the wisdom of the Perl Monks concerning the following question:

The following code works on my current version of perl. Array taxR could have 4,000 entries, there will be about 5 million different curEntry. The purpose of this is to check the taxonomic code of the current protein being read to see if it is included in an array of taxonomic codes to allow.

my @taxR = ("PLRV1", "PMTVS", "PVXHB");
my $curEntry = "PMTVS";
if($curEntry ~~ @taxR){
   print "do rest of stuff";
}
[download]

With this code my entire program takes about 20 seconds to run on my test data set and 30 minutes on the real thing. I've tried this

my @taxR = ("PLRV1", "PMTVS", "PVXHB");
my $curEntry = "PMTVS";
if( first { $_ eq $curEntry } @taxR ){
   print "do rest of stuff";
}
[download]

but the test data takes 3 minutes to run, so the real set would be unusably long. I have draconian IT guys that will never agree to upgrade perl on the Macs, version 5.8.8, so I was hoping you could help me find a replacement method that doesn't take a million years to run.

Comment on Need a replacement method for older version of perl Select or Download Code

Replies are listed 'Best First'.
Re: Need a replacement method for older version of perl by toolic (Bishop) on Jun 26, 2012 at 21:03 UTC
Hash look-ups will probably speed things up: `use warnings; use strict; my @taxR = ("PLRV1", "PMTVS", "PVXHB"); my %taxRh = map { $_ => 1 } @taxR; my $curEntry = "PMTVS"; if (exists $taxRh{$curEntry}) { print "do rest of stuff"; }` [download]	[reply] [d/l]
Re^2: Need a replacement method for older version of perl by vivomancer (Initiate) on Jun 26, 2012 at 23:25 UTC
Thanks, that's much more logical that what I was doing. Though it turns out that wasn't what was causing my program to break. I'm going to make a new thread since its going to be different enough from my topic title	[reply]
Re: Need a replacement method for older version of perl by tobyink (Canon) on Jun 26, 2012 at 21:06 UTC
Using `first` might not be a great idea. If `$curEntry` is the empty string `""`, and `@taxR` does actually contain an empty string (so you should expect a match), then `first` will end up returning false. The `any` function from List::MoreUtils is probably a better choice. As to your question, you might get better performance from a hash: `my @taxR = ("PLRV1", "PMTVS", "PVXHB"); # Copy the array into a hash. # Make sure you only do this once. my %taxR = map {$_ => 1} @taxR; my $curEntry = "PMTVS"; if (exists $taxR{$curEntry}) { print "do rest of stuff"; }` [download] `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]
Re^2: Need a replacement method for older version of perl by vivomancer (Initiate) on Jun 26, 2012 at 23:30 UTC
Thank you, that's much more logical that what I was doing. Though it turns out that wasn't what was causing my program to break. I'm going to make a new thread since its going to be different enough from my topic title	[reply]
Re: Need a replacement method for older version of perl by kcott (Archbishop) on Jun 27, 2012 at 07:09 UTC
In this sort of situation, you'll want to aim for the code inside the 5,000,000 iterations to be as minimal as possible. I ran a few commandline tests comparing the smartmatch with a regex. The regex was 5-10 times faster. Here's a typical run: `$ time perl -Mstrict -Mwarnings -E ' my @x = ((q{AXXX}) x 4000, qw{BXXX CXXX}); my $y = q{BXXX}; my $c = 0; for (1 .. 5000) { $y ~~ @x && ++$c; } say qq{count=$c}; ' count=5000 real 0m1.128s user 0m1.122s sys 0m0.004s $ time perl -Mstrict -Mwarnings -e ' my @x = ((q{AXXX}) x 4000, qw{BXXX CXXX}); my $y = q{BXXX}; my $c = 0; my $z = join q{\|} => @x; for (1 .. 5000) { $z =~ m{\b$y\b} && ++$c; } print qq{count=$c\n}; ' count=5000 real 0m0.142s user 0m0.138s sys 0m0.003s` [download] See also: Benchmark -- Ken	[reply] [d/l]
Re: Need a replacement method for older version of perl by Anonymous Monk on Jun 27, 2012 at 00:13 UTC
It is very easy in Perl to write a little-bit of code that does something in a very inefficient way. What you are asking the computer to do is to iterate sequentially through up to 4,000 records, 5 million times. You do the math. What is plowing you under the ground is virtual-memory overhead,	[reply]