Re: Too much SQL not enough perl

You could answer the problem by using grep or the way you specified in your question, but if you are considering using it in the same manner that it works in SQL you need a hash for performance reasons:

#!/usr/bin/perl -w
use strict;
use Benchmark qw(:all);

my @choices = qw|a b c|;

my %choices = map{$_=>1}@choices;

my @cData = <DATA>;

timethese(100000,{'Damn Slow'=>\&parseData1,
                'Much Better'=>\&parseData2});

sub parseData1 {
foreach (@cData){
chomp;
my @items = split/,/,$_;
#slow way, 
foreach my $item (@items){
if(grep {$item eq $_} @choices){
   #print qq|FOUND $item|;
 } 
}

}
}

sub parseData2 {
 foreach(@cData){
 chomp;
 foreach my $item (split/,/,$_){
    if ($choices{$item}){
       #print qq|FOUND $item\n|;
     }
   }
  }

}

__DATA__
z,t,m,u,a,b,c
s,t,l,m,z,a,s
c,b,a,m,u,t,n
k,l,t,s,z,r,t
[download]

Produces:

Benchmark: timing 100000 iterations of Damn Slow, Much Better...
 Damn Slow:  9 wallclock secs ( 8.81 usr +  0.00 sys =  8.81 CPU) @ 11
+348.16/s
n=100000)
Much Better:  4 wallclock secs ( 3.88 usr +  0.00 sys =  3.88 CPU) @ 2
+5806.45/s
(n=100000)
[download]

Celebrate Intellectual Diversity

Comment on Re: Too much SQL not enough perl Select or Download Code

Replies are listed 'Best First'.
Re^2: Too much SQL not enough perl by EvanCarroll (Chaplain) on Oct 10, 2005 at 04:29 UTC
I do not think it is fair you load up your hash prior to the benchmarking, that skews the results. Nor, is it fair that in the 'slow' one, you use a temporary array `my @items = split/,/,$_; foreach my $item (@items){` [download] vs `foreach my $item (split/,/,$_){` [download] Also, `if ($choices{$item})` should probably be `if (exists $choices{$item})` or you will choke on 0, empty strings, and undefs. UPDATE: Nor, is a grep a good idea, quoting a passage I remember reading in perldoc perlfaq: perldoc -q unique: These are slow (grep) (checks every element even if the first matches), inefficient (same reason), and potentially buggy But, granted it still says: Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren’t. Evan Carroll www.EvanCarroll.com	[reply] [d/l] [select]
Re^3: Too much SQL not enough perl by Roy Johnson (Monsignor) on Oct 10, 2005 at 13:33 UTC
Note that with a little skullduggery, you can make `grep` short-circuit. Granted, it's still much more clear to use `List::Util 'first'`, and for searching the same candidate list many times, it's more efficient to use a hash. `my @candidates = qw(z y a b c a d a e a f); foreach my $question ('a', 'm') { if (do{{;grep {$_ eq $question ? do {print "Match\n"; last} : 0 } @candidates}}) { print "Found $question\n"; } }` [download] Caution: Contents may have been coded under pressure.	[reply] [d/l] [select]
Re^4: Too much SQL not enough perl by Aristotle (Chancellor) on Oct 10, 2005 at 14:48 UTC
Of course that really just uses grep for its side-effect, and you could just a little less twisted-brainedly say something like `foreach my $question ('a', 'm') { if ( sub { $_ eq $question and return 1 for @candidates; 0; }->() +) { print "Found $question\n"; } }` [download] which is basically an inlined implementation of List::Util’s `first`. Makeshifts last the longest.	[reply] [d/l]
Re^5: Too much SQL not enough perl by Roy Johnson (Monsignor) on Oct 10, 2005 at 14:58 UTC
Re^6: Too much SQL not enough perl by Aristotle (Chancellor) on Oct 10, 2005 at 15:24 UTC
Re^3: Too much SQL not enough perl by InfiniteSilence (Curate) on Oct 10, 2005 at 15:10 UTC
Nor, is it fair that in the 'slow' one, you use a temporary array... EvanCarroll is right about the slight differences in the routines, so I commented out some things in code and made them both use the same array (note, I increased the number of elements to look for as well as the number of iterations for more meaningful results): #!/usr/bin/perl -w use strict; use Benchmark qw(:all); my @choices = qw\|a b c z m p t l c f g\|; my %choices = map{$_=>1}@choices; my @cData = <DATA>; timethese(50_000,{'Damn Slow'=>\&parseData1, 'Much Better'=>\&parseData2}); sub parseData1 { foreach (@cData){ chomp; my @items = split/,/,$_; #slow way, foreach my $item (@items){ if(grep {$item eq $_} @choices){ #print qq\|FOUND $item\|; } } } } sub parseData2 { foreach(@cData){ chomp; # foreach my $item (split/,/,$_){ my @items = split/,/,$_; #slow way, foreach my $item (@items){ if ($choices{$item}){ #print qq\|FOUND $item\n\|; } } } } __DATA__ z,t,m,u,a,b,c s,t,l,m,z,a,s c,b,a,m,u,t,n k,l,t,s,z,r,t [download] Which still produces: `perl seediff.pl Benchmark: timing 50000 iterations of Damn Slow, Much Better... Damn Slow: 6 wallclock secs ( 6.12 usr + 0.00 sys = 6.12 CPU) @ 81 +72.61/s (n=50000) Much Better: 3 wallclock secs ( 2.96 usr + 0.00 sys = 2.96 CPU) @ 1 +6874.79/s (n=50000)` [download] I think the above changes were 'fair' since the performance of a function like SQL in() in an actual database should not decrease noticeably with the number of elements. Celebrate Intellectual Diversity	[reply] [d/l] [select]