Re: (Efficiently) comparing arrays
by gmax (Abbot) on Apr 04, 2002 at 08:09 UTC
|
Sometimes, the most naive approach is the best rewarding. Talking about speed, the FAQ approach, i.e. just iterating through the array
items and returning at the first that doesn't match seems to be the fastest solution.
grep doesn't look good, because it will continue iterating even if the difference is in the first item.
#!/usr/bin/perl -w
use strict;
use Data::Compare;
use Benchmark;
sub aeq { # Zaxo
my ($first, $second, @comp) = @_;
return 0 if @{$first} != @{$second};
@comp = grep {$first->[$_] ne $second->[$_]} 0..$#$first;
return not @comp;
}
sub compare_arrays { # from FAQ
my ($first, $second) = @_;
return 0 unless @$first == @$second;
for (my $i = 0; $i < @$first; $i++) {
return 0 if $first->[$i] ne $second->[$i];
}
return 1;
}
my @first = (qw(a b c d e f), (1..200)) ;
my @second= (qw(b a c d e f), (1..200)) ;
timethese (20_000,
{
'grep' => sub {my $diff = aeq(\@first,\@second)},
'faq' => sub {my $diff = compare_arrays(\@first,\@second)},
'DComp' =>sub {my $diff = Compare(\@first,\@second)}
});
__END__
Benchmark: timing 20000 iterations of DComp, grep, naif...
DComp: 2 wallclock secs ( 1.42 usr + 0.00 sys = 1.42 CPU)
grep: 13 wallclock secs (13.13 usr + 0.00 sys = 13.13 CPU)
faq: 1 wallclock secs ( 0.55 usr + 0.00 sys = 0.55 CPU)
update
That was the worst case for grep. However, in the best case, i.e. when the difference is at the end of the array, grep is the fastest approach, as you would have expected.
As a personal choice, I would use the FAQ approach, since it can guarantee acceptable results in both extreme cases.
my @first = ((1..200), qw(a b c d e f)) ;
my @second= ((1..200), qw(b a c d e f)) ;
Benchmark: timing 2000 iterations of DComp, faq, grep...
DComp: 12 wallclock secs (11.52 usr + 0.00 sys = 11.52 CPU)
faq: 2 wallclock secs ( 1.48 usr + 0.00 sys = 1.48 CPU)
grep: 1 wallclock secs ( 1.35 usr + 0.00 sys = 1.35 CPU)
update (2)
MeowChow's improvement over the FAQ's algorithm has the best performance in both extreme cases! Good shot!
_ _ _ _
(_|| | |(_|><
_|
| [reply] [d/l] [select] |
|
You can get about a 30% performance improvement over the original naive code with the following:
| | my ($first, $second) = @_;
return 0 if @$first != @$second;
my $i = 0;
$second->[$i++] ne $_ && return 0 for @$first;
return 1;
|
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [d/l] |
|
Heh, wow. I guess simplicity is bliss sometimes. I took the liberty of using your benchmarking program, and added my function to compare to faq. The function i had come up with and was using was
sub maptest {
my ($low, $high) = @_;
my $i=0;
return 0 if @$low != @$high;
map{return 0 if $_ ne $low->[$i++]}@$high;
return 1;
}
__END__
with
my @first = ((1..200), qw(a b c d e f)) ;
my @second= ((1..200), qw(b a c d e f)) ;
Benchmark: timing 20000 iterations of faq, map...
faq: 8 wallclock secs ( 8.44 usr + 0.00 sys = 8.44 CPU)
map: 10 wallclock secs ( 9.64 usr + 0.00 sys = 9.64 CPU)
Needless to say they were nearly identical for a difference at the start of the array.
Thanks for all the help! | [reply] [d/l] |
|
The only reason that this is slower than the faq answer, or the solution i just posted, is that you are using map in void context. This is an efficiency no-no.
update: demerphq is also correct, removing the lexical context indeed contributes some of the performance increase. Since this is a one-time performance penalty, its effect will be greater relative to smaller arrays.
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] |
|
Re: comparing arrays
by Chmrr (Vicar) on Apr 04, 2002 at 04:55 UTC
|
This is a faq. The answer could also be found by typing perldoc -q equal at a prompt.
Share and enjoy! :)
Update: I should probably also mention / pimp the good node on How to RTFM. Don't read too much harshness into the F, though.
perl -pe '"I lo*`+$^X$\"$]!$/"=~m%(.*)%s;$_=$1;y^`+*^e v^#$&V"+@( NO CARRIER'
| [reply] [d/l] |
|
| [reply] |
|
Thank you. to start, i didn't see the faq. That helps. Also, the question was one of efficency, not of just simply doing it, since the arrays i'm dealing with have a tendency to become quite large. I originally had something crude like
$i=0;
foreach(@a) {
if ($a[$i]!=$b[$i]) { return 0;}
$i++; }
return 1;
So its more than just RTFMing. Althought i can't complain...having more FMs never hurts ^_^ | [reply] [d/l] |
Re: comparing arrays
by gav^ (Curate) on Apr 04, 2002 at 05:06 UTC
|
Most efficient in typing and testing needed :)
use Data::Compare;
if (Compare(\@a, \@b)) {
# equal
}
gav^ | [reply] [d/l] |
Re: comparing arrays
by Zaxo (Archbishop) on Apr 04, 2002 at 05:09 UTC
|
Here is an equality function for arrays of strings, takes references to two arrays as argument:
sub aeq {
my ($first, $second, @comp) = @_;
return 0 if @{$first} != @{$second};
@comp = grep {$first->[$_] ne $second->[$_]} 0..$#$first;
return not @comp;
}
We check to see that the lengths are the same, return false ( not equal) if they are not. Then we use grep over the indexes to scan for unequal elements, putting the index in another array for each mismatch. Finally we return the negation of the mismath array's length as result; no mismatches gave a zero length array.
Update: Thanks to gmax for spotting my blunder in obtaining the last index. Fixed. For kappa's concern, I was thinking more of avoiding a run off the end of $second if it was shorter. For efficiency, I'd write this in terms of a flipflop, .. in scalar context. On reflection, I'd either do that, or in list context return an array of indexes that match: sub aeq {
my ($first, $second, @comp) = @_;
return 0 if @{$first} != @{$second};
@comp = grep {$first->[$_] eq $second->[$_]} 0..$#$first;
wantarray ?
@comp :
@comp == @$first;
}
The choice is, as you say, between efficiency and general utility. Pick efficiency if you never need to filter for matches.
U2 Bah, I'm suffering from a little brain damage, repaired the newer code.
After Compline, Zaxo | [reply] [d/l] [select] |
|
And as usual, if you are fine with the first match (and you are in this case) and think of efficiency (and you do, as the comparison of lengths shows us), then why do you use grep and not a looping construct?
Does it have any advantages?
| [reply] |
Re: comparing arrays
by one4k4 (Hermit) on Apr 04, 2002 at 14:29 UTC
|
I know everybody is all concerned about speed, and how efficent things are in this thread, so I'm not 100% sure how good this snippet is, but hey, tmtowtdi.
#!/usr/bin/perl -w
use strict;
my @r = ('j','r','t','string value','numeric');
my @x = ('x','r','t','string value','numeric');
print join('',@r)," :array 1\n";
print join('',@x)," :array 2\n";
print "\n";
print "arrays match\n" if ((join('',@r))eq(join('',@x)));
_14k4 - perlmonks@poorheart.com (www.poorheart.com) | [reply] [d/l] |
|
IMHO, better to join them with a character that you know doesn't appear in the array's contents, e.g. \0 or pipe or something. Otherwise - unless I'm misunderstanding your code - it'll think ('ab', 'c') equals ('a', 'bc').
Cheers,
andy.
| [reply] |
|
| [reply] |
|
|
|
|