Fastest way to grep multiple time on some array

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Fastest way to grep multiple time on some array by GrandFather (Saint) on Feb 19, 2009 at 07:48 UTC
If you want good answers then you need to ask good questions. "best", "fastest" and similar ...st criteria generally depend on the details of what you want to do. Very often there is no '...st' answer, even in a well defined context. Taking that on board, would you care to rephrase your question providing a little more detail concerning the matches you wish to perform? Some important details are: Are you matching fixed strings? How many different match strings are there? How many elements do you need to match against (is 200M 200e6 elements, 200 MB, something else)? Is this a one time task, or do you need to run the matching process many times? What are the actual time constraints? Is there existing code that this needs to work with? Does the data come from something like a file or do they come from a database? Is it possible to put the data into a database if it isn't already? True laziness is hard work	[reply]
Re: Fastest way to grep multiple time on some array by CountZero (Bishop) on Feb 19, 2009 at 07:21 UTC
It is not really fully clear what you want: `grep` through the list and then further `grep` within the results from the first `grep`: use a chain of `grep`: `@final_result = grep /first condition/, grep /second condition/, @orig +inal_array;` [download] With a little bit of effort you can probably "fold" both conditions into one. That would be more efficient. `grep` through the list and then `grep` through the same list excluding already found elements: I suggest that you work with an array of arrays. Make each element into an anonymous array, with element 0 (zero) the original data and element 1 a "flag" indicating that it is an element that was already selected in a previous search (1 would mean found in the first search; 2 means found in the second search; ...). Only you should use `map` rather than search in this case as you will loop over the same list each time. Finally you have to loop over the whole list once more to put each found element in its own "bin" based upon the value of the flag. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re: Fastest way to grep multiple time on some array by grizzley (Chaplain) on Feb 19, 2009 at 07:03 UTC
Use hash to store under key 'A' values matching A, under key 'B' values matching B, etc.	[reply]
Re: Fastest way to grep multiple time on some array by Porculus (Hermit) on Feb 19, 2009 at 09:26 UTC
From the description, it sounds like you're taking a big array and trying to split it into a number of smaller arrays. I'd have expected the most efficient approach to be to do everything in a single pass, rather than looping over the array several times. That is to say, instead of `my @As = grep { /A/ } @bigarray; my @Bs = grep { /B/ } @bigarray; my @Cs = grep { /C/ } @bigarray;` [download] I would write `my (@As, @Bs, @Cs); for my $item (@bigarray) { if ($item =~ /A/) { push @As, $item } elsif ($item =~ /B/) { push @Bs, $item } elsif ($item =~ /C/) { push @Cs, $item } }` [download]	[reply] [d/l] [select]
Re^2: Fastest way to grep multiple time on some array by holli (Abbot) on Feb 19, 2009 at 11:03 UTC
or even `my %H; for my $item (@bigarray) { push @{$H{$1}}, $item if ($item =~ /(A\|B\|C)/); }` [download] holli When you're up to your ass in alligators, it's difficult to remember that your original purpose was to drain the swamp.	[reply] [d/l]
Re^3: Fastest way to grep multiple time on some array by wol (Hermit) on Feb 19, 2009 at 17:45 UTC
I think this would fail if one of the items in bigarray was "AB", as it would only add this item to $H{"A"} and not $H{"B"}. Alas we don't know for sure whether this would be a problem for the OP or not, but re-reading the post suggests maybe not. -- use JAPH; print JAPH::asString();	[reply]
Re: Fastest way to grep multiple time on some array by ikegami (Patriarch) on Feb 19, 2009 at 18:22 UTC
Can you grep and remove simultaneously? `sub grep_in_place(&\@) { my ($cb, $array) = @_; my $src = -1; my $dst = -1; while (++$src < @$array) { my $keep; $keep = $cb->() for $array->[$src]; $array->[++$dst] = $array->[$src] if $keep; } $#$array = $dst; } grep_in_place { ... } @array;` [download] Uses O(1) memory This could be done faster using XS since the the elements could be moved instead of copied. Update: Fixed the proto. Added disappeared `if $keep`.	[reply] [d/l] [select]
Re^2: Fastest way to grep multiple time on some array by repellent (Priest) on Feb 19, 2009 at 23:47 UTC
It doesn't quite work. `grep_and_remove()` should do the trick: use warnings; use strict; use Data::Dumper; sub grep_in_place(&@) { my ($cb, $array) = @_; my $src = -1; my $dst = -1; while (++$src < @$array) { my $keep; $keep = $cb->() for $array->[$src]; $array->[++$dst] = $array->[$src]; } $#$array = $dst; } sub grep_and_remove (&\@) { my ($cb, $array_ref) = @_; my @found; my $i = 0; GREP: while ($i < @{ $array_ref }) { $cb->() and push(@found, splice(@{ $array_ref }, $i, 1)) and next GREP for $array_ref->[$i]; ++$i; } return @found; } my @arr1 = (1,2,3,4,5,6,7,8); my @arr2 = grep_in_place { $_ > 4 } @arr1; my @arr3 = (1,2,3,4,5,6,7,8); my @arr4 = grep_and_remove { $_ > 4 } @arr3; print Dumper(\@arr1, \@arr2, \@arr3, \@arr4); __END__ $VAR1 = [ 1, 2, 3, 4, 5, 6, 7, 8 ]; $VAR2 = [ '-1' ]; $VAR3 = [ 1, 2, 3, 4 ]; $VAR4 = [ 5, 6, 7, 8 ]; [download]	[reply] [d/l] [select]
Re^3: Fastest way to grep multiple time on some array by ikegami (Patriarch) on Feb 19, 2009 at 23:57 UTC
You used it wrong, but I see I used the wrong prototype, and my "if $keep" disappeared. `use strict; use warnings; sub grep_in_place(&\@) { my ($cb, $array) = @_; my $src = -1; my $dst = -1; while (++$src < @$array) { my $keep; $keep = $cb->() for $array->[$src]; $array->[++$dst] = $array->[$src] if $keep; } $#$array = $dst; } my @a = (1,2,3,4,5,6,7,8); print("Before: @a\n"); grep_in_place { $_ > 4 } @a; print("After: @a\n");` [download] `Before: 1 2 3 4 5 6 7 8 After: 5 6 7 8` [download] To get your (and probably the OP's) semantics: `sub grep_and_remove(&\@) { my ($cb, $array) = @_; my $src = -1; my $dst = -1; my @deleted; while (++$src < @$array) { my $keep; $keep = $cb->() for $array->[$src]; ($keep ? $deleted[@deleted] : $array->[++$dst] ) = $array->[$src]; } $#$array = $dst; return @deleted; } my @a = (1,2,3,4,5,6,7,8); print("Before: @a\n"); my @b = grep_and_remove { $_ > 4 } @a; print("After: @a\n"); print("Returned: @b\n");` [download] `Before: 1 2 3 4 5 6 7 8 After: 1 2 3 4 Returned: 5 6 7 8` [download] No crazy splice!	[reply] [d/l] [select]
Re^4: Fastest way to grep multiple time on some array by repellent (Priest) on Feb 20, 2009 at 02:16 UTC
Re: Fastest way to grep multiple time on some array by gone2015 (Deacon) on Feb 19, 2009 at 17:15 UTC
Can you grep and remove simultaneously? I don't think so... I tried: `my @A = grep m/A/ && !defined($_ = undef), @foo ;` [download] and ended up with `@A` containing 'n' `undef` values... The following at least works: `use strict ; use warnings ; my @foo = ('A', 'B', 'C', 'A', 'B', 'A', 'D') ; my $q ; my @A = map defined($_) && m/A/ ? scalar($q = $_, $_ = undef, $q) : +(), @foo ; my @B = map defined($_) && m/B/ ? scalar($q = $_, $_ = undef, $q) : +(), @foo ; my @C = map defined($_) && m/C/ ? scalar($q = $_, $_ = undef, $q) : +(), @foo ; @foo = grep defined($_), @foo ; print "A: @A\n" ; print "B: @B\n" ; print "C: @C\n" ; print " @foo\n" ;` [download] BUT I'm making no claim for it efficiency-wise. (Also, it's ugly as sin.)	[reply] [d/l] [select]
Re: Fastest way to grep multiple time on some array by johngg (Canon) on Feb 20, 2009 at 00:29 UTC
If you work your way forwards from the end of the array, you can splice out the elements that match as you go, thus removing them from the array so they are not examined again. I store the results in a HoA keyed by letter. use strict; use warnings; use Data::Dumper; my @arr; push @arr , ( q{A} .. q{Z}, q{0} .. q{9} )[ int rand 36 ] . sprintf( q{%04d}, int rand 10000 ) for 1 .. 25; print Data::Dumper ->Dumpxs( [ \ @arr ], [ qw{ arr } ] ); my %letters = (); foreach my $letter ( q{A} .. q{Z} ) { last unless @arr; my $rxLetter = qr{$letter}; foreach my $idx ( reverse 0 .. $#arr ) { next unless $arr[ $idx ] =~ $rxLetter; push @{ $letters{ $letter } }, splice @arr, $idx, 1; } } print Data::Dumper ->new( [ \ %letters, \ @arr ], [ qw{ letters *arr } ] ) ->Sortkeys( 1 ) ->Dumpxs(); [download] The results. Read more... (3 kB) I hope this is of interest. I'm not sure what the performance will be like, I gather that `splice` can involve a lot of work behind the scenes. Cheers, JohnGG	[reply] [d/l] [select]