wertert has asked for the wisdom of the Perl Monks concerning the following question:

I am using perl grep() to perform a fast lookup on a basic table made up of a array of arrays ie
$table=[[1,2,3],[a,b,d],[A,D,C]];
the current lookup does a
@answers = grep { $_->[0] =~ /a/i && $_->[2] =~ /C/} @$table;
I am trying to think of a way to make the search more flexible I can search for any number of array elements. I've tried calling a function
%search_elements=(0=>a,2=>C); @answers = grep { scan($_,\%search_elements) } @$table; sub scan { my($row,$refToArgsHash)=@_; foreach (keys %$refToArgsHash) { unless($row->[$_] =~ /$refToArgsHash->{$_}/i) { return 0; #fail } } return 1; # ok all match }
But this is really slow. Does anyone think of a way of making this both flexible and fast ? or suggest a new direction to try. Many Thanks. Wertert

Replies are listed 'Best First'.
Re: grep flex
by BrowserUk (Patriarch) on Dec 06, 2002 at 15:14 UTC

    One possibility is to use a HoA'a instead of your AoA's for your table. That would make the lookup process much quicker than a linear search.

    #! perl -slw use strict; use Data::Dumper; my $table=[[1,2,2,3,2,1],[qw(a b c d e a)],[qw(A D D C B A)]]; #! Convert your AoA's to a HoA's. #! Use the values from the arrays as the key and store the indices in +the array #! You could just store the index as the value if the keys are unique. #! Better to build it this way in the first place though. my %table; push @{$table{"0:$table->[0][$_] 1:$table->[1][$_] 2:$table->[2][$_]"} +}, $_ for 0..$#{$table->[0]}; print Dumper(\%table); #! To find all entries which match on all three elements my @elements = qw(0:1 1:a 2:A); my @matches = @{$table{"@elements"}} if exists $table{"@elements"}; print "Element ids: @matches contained elements: @elements"; #! To find elements that match on a subset @matches = (); /0:2/ and /2:D/ and push @matches, @{$table{$_}} for keys %table; print "Element ids @matches contain elements: 0:2 2:D";

    Gives

    c:\test>218070 $VAR1 = { '0:2 1:e 2:B' => [ 4 ], '0:2 1:c 2:D' => [ 2 ], '0:1 1:a 2:A' => [ 0, 5 ], '0:2 1:b 2:D' => [ 1 ], '0:3 1:d 2:C' => [ 3 ] }; 0:1 1:a 2:A Element ids: 0 5 contained elements: 0:1 1:a 2:A Element ids 2 1 contain elements: 0:2 2:D c:\test>

    The caveat is that if your table elements contain 'n:...' sequences, then you might need to use a different fields specifier, but that wouldn't be too hard. Wrapping the lookup into a function that build the search keys from a list supplied would neaten the whole thing up a bit.


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: grep flex
by fruiture (Curate) on Dec 06, 2002 at 14:02 UTC

    one method to increase speed is to avoid hashing. This doesn't mean to create unfelxible datastructures, but in special cases hashing can be skipped for speed.

    Second is: precompile regular expressions via qr//.

    Third: Why pass $_ as argument? $_ is global. As long as scan() isn't used in different places...

    my $scan = [0,qr/a/,2,qr/C/]; my @filtered = grep scan $scan, @$table sub scan { my $aref = shift; my $i = 0; while( ++$i < @$aref ){ return unless $_->[$i-1] =~ $aref->[$i++] } 1 }

    This code is untested.

    --
    http://fruiture.de
Re: grep flex
by slife (Scribe) on Dec 06, 2002 at 14:52 UTC

    Wertert

    I note that in your sub 'scan' above, you use the /i modifier on every element. Hence, this:

    my $table = [ [qw(1 2 3)], [qw(a b d)], [qw(A D C)]] ; my $search_pattern = { 0=> 'a', 2 => 'c', }; sub findit { my $row = shift; my $pattern = shift; my $found = 0; my $key = lc(join '' => @{$row}[keys %{$pattern}]); $found = 1 if $key eq (join '' => values %{$pattern}); return $found; } my @found = grep {findit $_, $search_pattern, } @{$table};

    Note this requires the order of elements in the search pattern hash to match the order in the table, i.e. you couldn't do { 2=>'C', 0=>'a'} without sorting on the keys.