Re: Fastest way to grep multiple time on some array
by GrandFather (Saint) on Feb 19, 2009 at 07:48 UTC
|
If you want good answers then you need to ask good questions. "best", "fastest" and similar ...st criteria generally depend on the details of what you want to do. Very often there is no '...st' answer, even in a well defined context. Taking that on board, would you care to rephrase your question providing a little more detail concerning the matches you wish to perform? Some important details are:
- Are you matching fixed strings?
- How many different match strings are there?
- How many elements do you need to match against (is 200M 200e6 elements, 200 MB, something else)?
- Is this a one time task, or do you need to run the matching process many times?
- What are the actual time constraints?
- Is there existing code that this needs to work with?
- Does the data come from something like a file or do they come from a database?
- Is it possible to put the data into a database if it isn't already?
True laziness is hard work
| [reply] |
Re: Fastest way to grep multiple time on some array
by CountZero (Bishop) on Feb 19, 2009 at 07:21 UTC
|
It is not really fully clear what you want:
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
Re: Fastest way to grep multiple time on some array
by grizzley (Chaplain) on Feb 19, 2009 at 07:03 UTC
|
Use hash to store under key 'A' values matching A, under key 'B' values matching B, etc. | [reply] |
Re: Fastest way to grep multiple time on some array
by Porculus (Hermit) on Feb 19, 2009 at 09:26 UTC
|
From the description, it sounds like you're taking a big array and trying to split it into a number of smaller arrays. I'd have expected the most efficient approach to be to do everything in a single pass, rather than looping over the array several times.
That is to say, instead of
my @As = grep { /A/ } @bigarray;
my @Bs = grep { /B/ } @bigarray;
my @Cs = grep { /C/ } @bigarray;
I would write
my (@As, @Bs, @Cs);
for my $item (@bigarray) {
if ($item =~ /A/) { push @As, $item }
elsif ($item =~ /B/) { push @Bs, $item }
elsif ($item =~ /C/) { push @Cs, $item }
}
| [reply] [d/l] [select] |
|
|
my %H;
for my $item (@bigarray) {
push @{$H{$1}}, $item
if ($item =~ /(A|B|C)/);
}
holli
When you're up to your ass in alligators, it's difficult to remember that your original purpose was to drain the swamp.
| [reply] [d/l] |
|
|
| [reply] |
Re: Fastest way to grep multiple time on some array
by ikegami (Patriarch) on Feb 19, 2009 at 18:22 UTC
|
sub grep_in_place(&\@) {
my ($cb, $array) = @_;
my $src = -1;
my $dst = -1;
while (++$src < @$array) {
my $keep;
$keep = $cb->() for $array->[$src];
$array->[++$dst] = $array->[$src] if $keep;
}
$#$array = $dst;
}
grep_in_place { ... } @array;
Uses O(1) memory
This could be done faster using XS since the the elements could be moved instead of copied.
Update: Fixed the proto. Added disappeared if $keep.
| [reply] [d/l] [select] |
|
|
It doesn't quite work. grep_and_remove() should do the trick:
use warnings;
use strict;
use Data::Dumper;
sub grep_in_place(&@) {
my ($cb, $array) = @_;
my $src = -1;
my $dst = -1;
while (++$src < @$array) {
my $keep;
$keep = $cb->() for $array->[$src];
$array->[++$dst] = $array->[$src];
}
$#$array = $dst;
}
sub grep_and_remove (&\@) {
my ($cb, $array_ref) = @_;
my @found;
my $i = 0;
GREP:
while ($i < @{ $array_ref })
{
$cb->()
and push(@found, splice(@{ $array_ref }, $i, 1))
and next GREP
for $array_ref->[$i];
++$i;
}
return @found;
}
my @arr1 = (1,2,3,4,5,6,7,8);
my @arr2 = grep_in_place { $_ > 4 } @arr1;
my @arr3 = (1,2,3,4,5,6,7,8);
my @arr4 = grep_and_remove { $_ > 4 } @arr3;
print Dumper(\@arr1, \@arr2, \@arr3, \@arr4);
__END__
$VAR1 = [
1,
2,
3,
4,
5,
6,
7,
8
];
$VAR2 = [
'-1'
];
$VAR3 = [
1,
2,
3,
4
];
$VAR4 = [
5,
6,
7,
8
];
| [reply] [d/l] [select] |
|
|
use strict;
use warnings;
sub grep_in_place(&\@) {
my ($cb, $array) = @_;
my $src = -1;
my $dst = -1;
while (++$src < @$array) {
my $keep;
$keep = $cb->() for $array->[$src];
$array->[++$dst] = $array->[$src] if $keep;
}
$#$array = $dst;
}
my @a = (1,2,3,4,5,6,7,8);
print("Before: @a\n");
grep_in_place { $_ > 4 } @a;
print("After: @a\n");
Before: 1 2 3 4 5 6 7 8
After: 5 6 7 8
To get your (and probably the OP's) semantics:
sub grep_and_remove(&\@) {
my ($cb, $array) = @_;
my $src = -1;
my $dst = -1;
my @deleted;
while (++$src < @$array) {
my $keep;
$keep = $cb->() for $array->[$src];
($keep
? $deleted[@deleted]
: $array->[++$dst]
) = $array->[$src];
}
$#$array = $dst;
return @deleted;
}
my @a = (1,2,3,4,5,6,7,8);
print("Before: @a\n");
my @b = grep_and_remove { $_ > 4 } @a;
print("After: @a\n");
print("Returned: @b\n");
Before: 1 2 3 4 5 6 7 8
After: 1 2 3 4
Returned: 5 6 7 8
No crazy splice! | [reply] [d/l] [select] |
|
|
Re: Fastest way to grep multiple time on some array
by gone2015 (Deacon) on Feb 19, 2009 at 17:15 UTC
|
my @A = grep m/A/ && !defined($_ = undef), @foo ;
and ended up with @A containing 'n' undef values...
The following at least works: use strict ;
use warnings ;
my @foo = ('A', 'B', 'C', 'A', 'B', 'A', 'D') ;
my $q ;
my @A = map defined($_) && m/A/ ? scalar($q = $_, $_ = undef, $q) :
+(), @foo ;
my @B = map defined($_) && m/B/ ? scalar($q = $_, $_ = undef, $q) :
+(), @foo ;
my @C = map defined($_) && m/C/ ? scalar($q = $_, $_ = undef, $q) :
+(), @foo ;
@foo = grep defined($_), @foo ;
print "A: @A\n" ;
print "B: @B\n" ;
print "C: @C\n" ;
print " @foo\n" ;
BUT I'm making no claim for it efficiency-wise. (Also, it's ugly as sin.) | [reply] [d/l] [select] |
Re: Fastest way to grep multiple time on some array
by johngg (Canon) on Feb 20, 2009 at 00:29 UTC
|
If you work your way forwards from the end of the array, you can splice out the elements that match as you go, thus removing them from the array so they are not examined again. I store the results in a HoA keyed by letter.
use strict;
use warnings;
use Data::Dumper;
my @arr;
push @arr
, ( q{A} .. q{Z}, q{0} .. q{9} )[ int rand 36 ]
. sprintf( q{%04d}, int rand 10000 )
for 1 .. 25;
print Data::Dumper
->Dumpxs( [ \ @arr ], [ qw{ *arr } ] );
my %letters = ();
foreach my $letter ( q{A} .. q{Z} )
{
last unless @arr;
my $rxLetter = qr{$letter};
foreach my $idx ( reverse 0 .. $#arr )
{
next unless $arr[ $idx ] =~ $rxLetter;
push @{ $letters{ $letter } }, splice @arr, $idx, 1;
}
}
print Data::Dumper
->new( [ \ %letters, \ @arr ], [ qw{ *letters *arr } ] )
->Sortkeys( 1 )
->Dumpxs();
The results.
I hope this is of interest. I'm not sure what the performance will be like, I gather that splice can involve a lot of work behind the scenes.
Cheers, JohnGG | [reply] [d/l] [select] |