Re: Performance optimization question
by Joost (Canon) on Apr 02, 2008 at 22:23 UTC
|
I'd probably try a regex only, but I'm not sure it will improve matters. This code assumes /reg exp/ does not match | characters.
my @results = $string =~ /(?:^|\|)[^|]*(reg exp)[^|]*(?:\||$)/g;
update: adjusted regex. also: sorry for double-post. please reap the above node if you can. the site is amazingly slow today and I can't seem to reach it.
| [reply] [d/l] |
Re: Performance optimization question
by BrowserUk (Patriarch) on Apr 02, 2008 at 22:43 UTC
|
my @arr1 = grep { /reg exp/ } split /\|/, $string;
Rate orig A
orig 232/s -- -30%
A 331/s 43% --
And if the lack of mys and the need to set the length of the array in your code indicates you are using globals instead of lexicals, note that lexicals are usually a few percent faster.
A lot will depend upon how long the string is, how many elements it splits into, the complexity of /reg exp/, and the proportion of elements beig excluded. More info might yield better responses.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
my @arr1 = grep /reg exp/, split /\|/, $string;
| [reply] [d/l] |
|
|
#! perl -slw
use strict;
use Data::Dump qw[ pp ];
use Benchmark qw[ cmpthese ];
our $string = join '|', map{
join rand() < 0.2
? 'fred'
: 'bill',
'pqr', 'xyz'
} 1 .. 10000;
our $first = 0;
our %counts;
cmpthese -1, {
orig => q[
my @arr = split(/\|/, $string);
my @arr1 = grep { /fred/ } @arr;
$counts{ orig } = @arr1;
],
Buk1 => q[
my @arr1 = grep { /fred/ } split /\|/, $string;
$counts{ Buk1 } = @arr1;
],
jwkrahn => q[
my @arr1 = grep /fred/, split /\|/, $string;
$counts{ jwkrahn } = @arr1;
],
Buk2 => q[
my @arr1 = $string =~ m[(?:^|\|)(.*?fred.*?)(?=\||$)]g;
$counts{ Buk2 } = @arr1;
],
JOOST => q[
my @arr1 = $string =~ /(?:^|\|)([^|]*?fred[^|]*?)(?=\||$)/g;
$counts{ JOOST } = @arr1;
],
};
pp \%counts;
__END__
c:\test>junk6
Rate orig Buk1 JOOST jwkrahn Buk2
orig 20.2/s -- -28% -48% -58% -84%
Buk1 28.1/s 39% -- -28% -42% -77%
JOOST 39.2/s 94% 39% -- -19% -68%
jwkrahn 48.4/s 140% 72% 24% -- -61%
Buk2 124/s 515% 342% 217% 157% --
{ Buk1 => 2010, Buk2 => 2010, JOOST => 2010, jwkrahn => 2010, orig =>
+2010 }
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
|
|
|
|
Re: Performance optimization question
by ikegami (Patriarch) on Apr 02, 2008 at 22:41 UTC
|
What's the regular expression in question?
What's the code that uses this? A macro change is usually the way to go.
$#arr1 = -1; is totally useless seeing how @arr1 is overwritten on the next line.
| [reply] [d/l] [select] |
Re: Performance optimization question
by Anonymous Monk on Apr 02, 2008 at 22:32 UTC
|
Other than the approaches suggested by Joost (probably the best), avoiding intermediate results might help.
@arr1 = grep /reg exp/, split /\|/, $string;
| [reply] [d/l] |
Re: Performance optimization question
by moritz (Cardinal) on Apr 03, 2008 at 06:49 UTC
|
If reg exp is meant literally, you can use index to search for the literal substring - it's faster than a regular expression. | [reply] [d/l] |
|
|
$s = 'x'x3000 . 'reg exp' . 'x'x3000;
cmpthese -1, {
REGEX => q[ $x = $s =~ /reg exp/;],
INDEX => q[ $x = index $s, 'reg exp';]
};;
Rate INDEX REGEX
INDEX 40470/s -- -74%
REGEX 156392/s 286% --
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
Are you sure the regex engine isn't "cheating" by caching something?
#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(time);
my $num = 3_000_000;
my $str = 'x'x$num . 'reg exp' . 'x'x$num;
my $start = time;
my $res = $str =~ m/reg exp/;
print time - $start, $/;
$start = time;
my $idx = index $str, 'reg exp';
print time - $start, $/;
__END__
0.0114099979400635
0.011760950088501
Admittedly, index is still a bit slower, but the difference isn't that huge.
BTW on my machine (with perl 5.8.8) the difference isn't there at all:
Rate REGEX INDEX
REGEX 87061/s -- -2%
INDEX 88494/s 2% --
The results only differ slightly for 5.10.0. Which perl did you use?
I thought that index and regexes use the same algorithm, but the regex goes through the pain of compiling the regex first | [reply] [d/l] [select] |
|
|
|
|
That's odd. Aren't they suppose to be using the same algorithm internally?
| [reply] |
|
|
|
|
General request|comment: Could we have a failing test too in benchmarks (at least for regex, index, and such)?
| [reply] |
|
|
|
|
Re: Performance optimization question
by Joost (Canon) on Apr 02, 2008 at 22:21 UTC
|
please ignore this post, and see my post below. sorry for double posting.
I'd probably try a regex only, but I'm not sure it will improve matters. This code assumes /reg exp/ does not match | characters and that $string doesn't start with a |:
my @results = $string =~ /(reg exp)(?:\||$)/g;
| [reply] [d/l] |