Re: Alternative matches
by PodMaster (Abbot) on Oct 06, 2004 at 10:02 UTC
|
Nothing "bad", but there is room for improvement.
use Regex::PreSuf;
die presuf(split /\|/, 'new|old|number|start|simple|cross|heavy|die|ex
+it'),$/;
__END__
(?:cross|die|exit|heavy|n(?:ew|umber)|old|s(?:imple|tart))
A simple benchmark
use Benchmark 'cmpthese';
my @words = qw[ new old number start simple cross heavy die exit ];
my $string = join ' 0\4/f ', map( { rand $_ } 1 .. 60), map { $words[
+rand @words ] } 1 .. 20;
cmpthese( -3, {
straight => sub {
my $matches = () = $string =~ /new|old|number|start|simple|cro
+ss|heavy|die|exit/g;
return;
},
optimial => sub {
my $matches = () = $string =~ /(?:cross|die|exit|heavy|n(?:ew|
+umber)|old|s(?:imple|tart))/g;
return;
},
});
__END__
C:\>perl re.bench.pl
Rate straight optimial
straight 1948/s -- -15%
optimial 2288/s 17% --
C:\>perl re.bench.pl
Rate straight optimial
straight 1994/s -- -14%
optimial 2322/s 16% --
C:\>perl re.bench.pl
Rate straight optimial
straight 1963/s -- -16%
optimial 2324/s 18% --
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | | ** The third rule of perl club is a statement of fact: pod is sexy. |
| [reply] [d/l] [select] |
Re: Alternative matches
by gaal (Parson) on Oct 06, 2004 at 09:16 UTC
|
Who said it's bad?
(If you're constructing the regexp programmatically, take care to quotemeta the parts.) | [reply] [d/l] |
|
|
Hm...I had the impression it was something to be avoided because the long list of alternatives could slow things down...
| [reply] |
|
|
| [reply] |
|
|
|
|
Last time I checked, the list was a _lot_ faster than searching every item alone.
| [reply] |
Re: Alternative matches
by TedPride (Priest) on Oct 06, 2004 at 09:32 UTC
|
If you have a lot of terms to search for, you can probably speed things up by creating a lowercase copy of the text and searching it case sensitive. | [reply] |
|
|
I tested this just to make sure, and searching a string of letters 25K long 1000 times, there was a savings of approx 2 seconds (21 vs 23) by making a lowercase copy. The same difference could also be seen searching 5K text 5000 times.
| [reply] |
|
|
Thanks, I shall remember that :)
| [reply] |
|
|
I tested this just to make sure . . . there was a savings of approx 2 seconds (21 vs 23)
Usually we like to see Benchmark code to back up this type of assertion. Using a the setup from PodMaster's (i.e. generating a random $string to test against), I get the following results, which seem to validate your results:
$ perl match
Rate orig lc
orig 1492/s -- -7%
lc 1608/s 8% --
$ perl match
Rate orig lc
orig 1506/s -- -7%
lc 1620/s 8% --
$ perl match
Rate orig lc
orig 1492/s -- -8%
lc 1627/s 9% --
And, in case you want to test this for yourself (or modify it to be more accurate to what you're doing), here's the code:
use strict;
use Benchmark qw(cmpthese);
my $pattern = '(new|old|number|start|simple|cross|heavy|die|exit)';
my @words = qw( new Old Number start Simple Cross heavy die Exit );
my $string = join ' 0\4/f ',
map( { rand $_ } 1 .. 60),
map { $words[ rand @words ] }
1 .. 20;
cmpthese (-2, {
orig => sub { $string =~ /$pattern/i },
lc => sub { lc $string =~ /$pattern/ },
}
);
| [reply] [d/l] [select] |
Re: Alternative matches
by Jasper (Chaplain) on Oct 06, 2004 at 10:33 UTC
|
I've used Regexp::Optimizer in the past for long lists to do the character class thing like above, and it works fine. Unless you're using _very_ long lists to match, in which case constructing the regexp can take a long time (with some deep recursion thrown in).
One other thing to note is that for the simple complete word case you give, it's probably a good idea to anchor the words with \b($string_of_words)\b or ^($string_of_words)$. | [reply] [d/l] [select] |
Re: Alternative matches
by foss_city (Novice) on Oct 06, 2004 at 10:41 UTC
|
Isn't index() often faster for this sort of fixed-string search? If so, would it be better to chain a series of calls to index() with the || operator? (Except that since you want case-insensitivity, you would want to lc($string) first.)
Just pondering . . . . | [reply] |
|
|
my @words = qw[ new old number start simple cross heavy die exit ];
my $string = q~whatever the NEw old you're trying to match~;
my $lcstring = lc $string;
my $ret = 0;
if( do {
for my $val( @words ){
my $r = index $lcstring, $val;
if( $r > -1 ){
$ret = substr $string, $r, length $val;
last;
}
}
$ret;
} ){
warn "the \$1 (aka ret) is => $ret\n\n";
}
__END__
the $1 (aka ret) is => NEw
Granted you could turn that do{} into a function, but I don't think its worth the effort
(even if there may be a slight speed boost).
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | | ** The third rule of perl club is a statement of fact: pod is sexy. |
| [reply] [d/l] [select] |
|
|
Oops. i missed the capturing parentheses . . . . *blushes*
However, outside of that, i'm not clear on why one wouldn't want to do something like:
if ( index($string, 'new') ||
index($string, 'old') ||
index($string, 'number') ||
# etc.
) { # etc. }
except that it does seem ugly to me, now i look at it. :-)
Could you elaborate, please? | [reply] [d/l] |
|
|
|
|
|
|
|
|
|
|
$r = index $lcstring, $val;;
$ret = substr $string, $r, length $val;
Seems like a very obfuscated way of saying $ret = $val if -1 < index $lcstring, $val; No? | [reply] [d/l] [select] |
|
|