Ppeoc has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I have a huge array of keywords in array 1(around 150). I have two more arrays that contain elements of the 1st array (array 2 and 3). What is the best way to search for elements of array 1 in array 2 and 3. This is what I am doing right now.
use strict; use warnings; my @choices = qw/Apple Banana Tiger Lion Cat Turtle/; # Big array with + 150 elements my @Single = qw/Apple Turtle/; my @Double = qw/Lion/; foreach(@choices){ my $field = $_; if (grep {$_ eq $field} @Single) { $flag =1; } elsif (grep {$_ eq $field} @Double) { $flag =2; } else{ $flag =0; } }
Basically I would like to assign a flag value to each element of choices. Depending on the value of flag, each element of @choices will be parsed separately. Would it be more efficient to save choices as a hash? Any inputs would be appreciated. Thanks! :)

Replies are listed 'Best First'.
Re: The most efficient way for searching for an element in an array?
by BillKSmith (Monsignor) on Jan 07, 2016 at 04:50 UTC
    I would copy @Single and @Double into hashes. Use exists rather than grep.
    use strict; use warnings; my @choices = qw/Apple Banana Tiger Lion Cat Turtle/; my @Single = qw/Apple Turtle/; my @Double = qw/Lion/; my %Single_hash = map {$_ => 1} @Single; my %Double_hash = map {$_ => 1} @Double; foreach my $choice (@choices) { my $flag = exists $Single_hash{$choice} ? 1 : exists $Double_hash{$choice} ? 2 : 0 ; print "$choice => $flag\n"; }
    Bill
        In fact the usefull discussion suggested by Athanasius tell us to use the smart match operator ~~

        My benchmarks (tell if i'm wrong) tell us the same: the smart match operator is the fastest.
        use strict; use warnings; use Benchmark qw(cmpthese timethese); use 5.010; use List::Util qw(any); my $search = $ARGV[0]||'perlaceo'; my $file = $ARGV[1]||'enwiktionary-latest-all-titles-in-ns0'; die unless $file; my @lines; open my $fh,'<',$file or die "Unable to open $file"; @lines = <$fh>; print "\nFILE: $file\nARRAY ELEMENTS: ", scalar @lines," (read in ",(time - $^T), " seconds).\n\n"; my $iterations = -3; ###################################################################### +########## my $name1 = "Plain array lookup"; # my $code1 = <<'END_CODE1'; for(@lines){ last if $_ eq $search } END_CODE1 ###################################################################### +########## my $name2 = "List::Util any "; # my $code2 = <<'END_CODE2'; any { $_ eq $search } @lines; END_CODE2 ###################################################################### +########## my $name3 = "smart match ~~ "; # my $code3 = <<'END_CODE3'; $search ~~ @lines; END_CODE3 ###################################################################### +########## my $results = timethese($iterations, { $name1 => $code1, $name2 => $code2, $name3 => $code3 } ) ; print "\n\n"; cmpthese( $results );
        here my results:

        FILE: cat.txt ARRAY ELEMENTS: 3 (read in 0 seconds). Rate List::Util any Plain array lookup sma +rt match ~~ List::Util any 2548237/s -- -38% + -62% Plain array lookup 4105378/s 61% -- + -38% smart match ~~ 6655789/s 161% 62% + -- FILE: HOSTS-zero-tollerance.txt ARRAY ELEMENTS: 10605 (read in 0 seconds). Rate List::Util any Plain array lookup sma +rt match ~~ List::Util any 2599992/s -- -41% + -62% Plain array lookup 4437049/s 71% -- + -36% smart match ~~ 6879527/s 165% 55% + -- FILE: 02packages.details.txt ARRAY ELEMENTS: 165584 (read in 2 seconds). Rate List::Util any Plain array lookup sma +rt match ~~ List::Util any 2481882/s -- -46% + -61% Plain array lookup 4556540/s 84% -- + -28% smart match ~~ 6362283/s 156% 40% + -- FILE: enwiktionary-latest-all-titles-in-ns0 ARRAY ELEMENTS: 4316483 (read in 91 seconds). Rate List::Util any Plain array lookup sma +rt match ~~ List::Util any 2565579/s -- -44% + -63% Plain array lookup 4621251/s 80% -- + -34% smart match ~~ 6972221/s 172% 51% + --

        L*
        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Right track, but still too much work being done in the loop!

      my @choices = qw/Apple Banana Tiger Lion Cat Turtle/; my @Single = qw/Apple Turtle/; my @Double = qw/Lion/; my %flags = ( map { $_ => 1 } @Single ), ( map { $_ => 2 } @Double ); for my $choice (@choices) { my $flag = $flags{$choice} || 0; print "$choice => $flag\n"; }

        Yes, that will work if @Single and @Double aren't allowed to share any elements...

Re: The most efficient way for searching for an element in an array?
by Athanasius (Archbishop) on Jan 07, 2016 at 04:31 UTC

    Hello Ppeoc,

    If efficiency is really important, you should use a short-circuiting function rather than grep, which evaluates all instances in the input list. For example, from List::Util#any:

    Many cases of using grep in a conditional can be written using any instead, as it can short-circuit after the first true result.

    BTW, are you certain that @Single and @Double will never have values in common? If not, you may need to change the logic to incorporate a fourth flag:

    use strict; use warnings; use Data::Dump; use List::Util qw( any ); my @choices = qw/Apple Banana Tiger Lion Cat Turtle/; my @Single = qw/Apple Turtle/; my @Double = qw/Apple Lion/; my %choices; for my $field (@choices) { my $flag = 0; $flag = 1 if any { $_ eq $field } @Single; $flag += 2 if any { $_ eq $field } @Double; $choices{$field} = $flag; } # $flag == 3 means $field is present in both @Single and @Double dd \%choices;

    Output:

    14:27 >perl 1508_SoPW.pl { Apple => 3, Banana => 0, Cat => 0, Lion => 2, Tiger => 0, Turtle => +1 } 14:28 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: The most efficient way for searching for an element in an array?
by GrandFather (Saint) on Jan 07, 2016 at 04:01 UTC

    150 entries in an array is trivial (unless you are doing something really dumb with the array). However, for code clarity and maintenance a hash is likely to be a better fit for your task, whatever it really is. I suspect though that a database is what you should really be using!

    Premature optimization is the root of all job security
Re: The most efficient way for searching for an element in an array?
by salva (Canon) on Jan 07, 2016 at 08:31 UTC
    If arrays 2 and 3 are not too big and they fit in the available RAM, just convert them into hashes for faster lookup.

    Otherwise, sort the three arrays using some external sort algorithm and then read them in parallel while looking for duplicates.

Re: The most efficient way for searching for an element in an array?
by hotchiwawa (Scribe) on Jan 07, 2016 at 15:50 UTC
    EDIT: Version updated below in another post to avoid full scan.
    Hi Ppeoc :)

    use strict; use warnings; my @arr = qw(Apple Banana Tiger Lion Cat Turtle); my $flag1 = (grep { /Apple|Turtle/ } @arr) ? 1 : 0; my $flag2 = (grep { /Lion|Tiger/ } @arr) ? 1 : 0; print "flag1:$flag1 flag2:$flag2\n";

    Edit: why do you give me a bad reputation?
    This code is working and better than the original!
      why do you give me a bad reputation?
      I don't know (I haven't downvoted the post). Nevertheless, I see some issues:
      1. /Cat|Turtle/ would match Catalogue as well.
      2. grep examines all the elements of the array, even if it finds a match at the very beginning. That's probably not "the most efficient way".
      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
        choroba :)

        1) yep it's OR condition. 2) yep full scan not perfect but not returned. Maybe there is an operator to quit the grep... I will check
        A new version :)
        use strict; use warnings; my @arr = qw(Apple Banana Tiger Lion Cat Turtle); my $arr = join("|", @arr); my $flag1 = ($arr =~ m/Apple|Turtle/) ? 1 : 0; my $flag2 = ($arr =~ m/Lion|Tiger/) ? 1 : 0; print "flag1:$flag1 flag2:$flag2\n"; $arr = undef; #you can release the array too undef @arr;

        Now there is no more full scan when a match is encountered.

        Peace
          A reply falls below the community's threshold of quality. You may see it by logging in.