Re: Matching arrays

Using a join is much faster.

update (1). Provided that the arrays are both sorted and contiguous. Thanks robartes for pointing it out.

update (2). A modified routine, also using join solves both the above problems. (see below)

#!/usr/bin/perl -w
use strict;
use Benchmark qw(cmpthese timethese);

my $num_elem = 10_000;
my $count = 500;

my @elements = map {$_} (1..$num_elem), (qw(10001 10002 10003 10004 10
+005));
my @search = qw( 10001 10002 10003 );

# ------------------------------------------
sub match_arrays{ # gmax
    my ($elem, $find) = @_;
    my $str_find = join "/", @$find;
    return join('/',@$elem) =~ /$str_find/;
}
#-------------------------------------------

#print match_arrays(\@elements, \@search),"\n";

sub contains { # robartes
    my $array=shift;
    my $container=shift;
    my %container=map { $_ => undef } @$container;
    foreach (@$array) {
        return 0 unless (exists $container{$_});
    }
    return 1;
}

sub in_ordered_set { # demerphq
    my ($s,$a)=@_;
    return 1 unless @$s;
    return 0 unless @$a;
    my ($i,$j)=(0,0);
    while ($i<@$s) {
      while ($j<@$a) {
        last if $a->[$j] == $s->[$i];
        return 0 if $a->[$j]>$s->[$i];
        $j++;
      }
      $j++;
      $i++;
    }
    return 1
}

my $result = timethese($count, {
        'join' => sub {match_arrays(\@elements, \@search)},
        'hash' => sub {contains (\@search, \@elements)},
        'set' => sub {in_ordered_set (\@search, \@elements)}
});
cmpthese ($result);

__END__
Array elements: 1000
Benchmark: timing 2000 iterations of hash, join, set...
      hash:  4 wallclock secs ( 4.34 usr +  0.02 sys =  4.36 CPU)
      join:  0 wallclock secs ( 0.47 usr +  0.00 sys =  0.47 CPU)
       set:  4 wallclock secs ( 3.64 usr +  0.00 sys =  3.64 CPU)
       Rate hash  set join
hash  459/s   -- -17% -89%
set   549/s  20%   -- -87%
join 4255/s 828% 674%   --

Array elements: 10000

Benchmark: timing 500 iterations of hash, join, set...
      hash: 15 wallclock secs (14.72 usr +  0.06 sys = 14.78 CPU)
      join:  1 wallclock secs ( 1.26 usr +  0.01 sys =  1.27 CPU)
       set: 10 wallclock secs ( 9.04 usr +  0.02 sys =  9.06 CPU)
       Rate  hash   set  join
hash 33.8/s    --  -39%  -91%
set  55.2/s   63%    --  -86%
join  394/s 1064%  613%    --
[download]

update (2). This modified sub can also find a match when the arrays are not sorted, with almost the same efficiency.

sub match_arrays2{ # gmax
    my ($elem, $find) = @_;
    my @regex_find = map {qr /#$_#/} @$find;
    my @match =  ();
    my $elem_join = '#' .join('#',@$elem) . "#";
    for (@regex_find) {
        push @match, undef if  $elem_join =~ /$_/  ;
    }
    return @match == @$find;
}
__END__
Benchmark: timing 500 iterations of hash, join, join2, set...
      hash: 15 wallclock secs (14.66 usr +  0.06 sys = 14.72 CPU) 
      join:  1 wallclock secs ( 1.27 usr +  0.00 sys =  1.27 CPU) 
     join2:  2 wallclock secs ( 1.27 usr +  0.00 sys =  1.27 CPU) 
       set: 10 wallclock secs ( 9.89 usr +  0.04 sys =  9.93 CPU) 
        Rate  hash   set  join join2
hash  34.0/s    --  -33%  -91%  -91%
set   50.4/s   48%    --  -87%  -87%
join   394/s 1059%  682%    --    0%
join2  394/s 1059%  682%    0%    --
[download]

 _  _ _  _  
(_|| | |(_|><
 _|
[download]

Comment on Re: Matching arrays Select or Download Code