Re^2: Match speed of R in array procesing

I haven't tried it yet but I would usually do it like this;

foreach $i (@Array_1) {

       $n = grep {$Array_2[$_] == "$i"} 0 .. $#Array_2;
       push(@m, $n);

}
[download]

Once I have the indexes I just delete all the elements with that index in Array 2 and 3. I haven't used perl for a while but as far as I remember I was shocked how fast R does that sort of things. But that must be my not knowing perl enough.

Comment on Re^2: Match speed of R in array procesing Download Code

Replies are listed 'Best First'.
Re^3: Match speed of R in array procesing by moritz (Cardinal) on Mar 28, 2012 at 16:08 UTC
One problem with your code is that it scales as O(m * n), where `m == scalar @Array_1` and `n == scalar @Array_2` Here's a solution that runs in O(m + n) instead, which should be much faster for large arrays: `use strict; use warnings; use 5.010; # only needed for say() my @Array_1 = ("a1","a2","a3","a4","a5","a6"); my @Array_2 = ("a1","b2","c3","a4","f5","a6"); my @Array_3 = ("1","2","3","4","5","6"); my %seen; @seen{@Array_2} = undef; my @idx = grep exists $seen{$Array_1[$_]}, 0..$#Array_1; @Array_1 = @Array_1[@idx]; @Array_3 = @Array_3[@idx]; say "@Array_1"; say "@Array_3";` [download] There are several other ways to write that same algorithm (for example you could use splice to delete array elements one by one in-place, or push onto two new arrays in parallel), but only a benchmark shows which one is fastest. Perl 6 - second systems done right	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Match speed of R in array procesing
by moritz (Cardinal) on Mar 28, 2012 at 16:08 UTC

One problem with your code is that it scales as O(m * n), where m == scalar @Array_1 and n == scalar @Array_2

Here's a solution that runs in O(m + n) instead, which should be much faster for large arrays:

use strict;
use warnings;
use 5.010; # only needed for say()

my @Array_1 = ("a1","a2","a3","a4","a5","a6");
my @Array_2 = ("a1","b2","c3","a4","f5","a6");
my @Array_3 = ("1","2","3","4","5","6");

my %seen;
@seen{@Array_2} = undef;
my @idx = grep exists $seen{$Array_1[$_]}, 0..$#Array_1;
@Array_1 = @Array_1[@idx];
@Array_3 = @Array_3[@idx];

say "@Array_1";
say "@Array_3";
[download]

There are several other ways to write that same algorithm (for example you could use splice to delete array elements one by one in-place, or push onto two new arrays in parallel), but only a benchmark shows which one is fastest.

Perl 6 - second systems done right

[reply]
[d/l]
[select]