in reply to regex is too long
It looks like index is the winner in this simple test, but I trust that wiser monks will let me know if and how I'm producing bad results!
First results when there is no match found:
Then results when there is always a match found:Benchmark: running BigRE, Index, OptRE, m// loop, each for at least 10 + CPU seconds... BigRE: 11 wallclock secs (10.56 usr + 0.00 sys = 10.56 CPU) @ 11 +690.90/s (n=123491) Index: 11 wallclock secs (10.61 usr + 0.00 sys = 10.61 CPU) @ 43 +921.29/s (n=465961) OptRE: 11 wallclock secs (10.59 usr + 0.00 sys = 10.59 CPU) @ 17 +239.59/s (n=182619) m// loop: 11 wallclock secs (10.50 usr + 0.00 sys = 10.50 CPU) @ 68 +25.71/s (n=71670)
Benchmark: running BigRE, Index, OptRE, m// loop, each for at least 10 + CPU seconds... BigRE: 11 wallclock secs (10.39 usr + 0.00 sys = 10.39 CPU) @ 12 +548.26/s (n=130389) Index: 10 wallclock secs (10.53 usr + 0.00 sys = 10.53 CPU) @ 56 +956.89/s (n=599813) OptRE: 11 wallclock secs (10.53 usr + 0.00 sys = 10.53 CPU) @ 17 +720.63/s (n=186616) m// loop: 11 wallclock secs (10.48 usr + 0.00 sys = 10.48 CPU) @ 78 +73.90/s (n=82550)
Update: Damn, I knew I was missing something (probably still am...) - thanks dws! I don't keep spam around so here's the results for a chunk of 12k of text-only e-mail (as spam tends to be long-winded). I skipped the hand-optimized regex as I used nonsense words in the next case to ensure no match would be found and I didn't feel like thinking about it too hard. In both the following cases there are still just 7 words in the match list.
So, for no match:
With matches:Benchmark: running BigRE, Index, OptRE, m// loop, each for at least 10 + CPU seconds... BigRE: 10 wallclock secs (10.59 usr + 0.00 sys = 10.59 CPU) @ 77 +.59/s (n=822) Index: 11 wallclock secs (10.50 usr + 0.00 sys = 10.50 CPU) @ 14 +27.58/s (n=14991) m// loop: 11 wallclock secs (10.47 usr + 0.00 sys = 10.47 CPU) @ 68 +6.50/s (n=7187)
Benchmark: running BigRE, Index, OptRE, m// loop, each for at least 10 + CPU seconds... BigRE: 10 wallclock secs (10.63 usr + 0.00 sys = 10.63 CPU) @ 20 +25.69/s (n=21523) Index: 11 wallclock secs (10.47 usr + 0.00 sys = 10.47 CPU) @ 19 +019.11/s (n=199092) OptRE: 11 wallclock secs (10.51 usr + 0.00 sys = 10.51 CPU) @ 27 +02.33/s (n=28415) m// loop: 11 wallclock secs (10.45 usr + 0.00 sys = 10.45 CPU) @ 95 +93.84/s (n=100294)
And the test code:
use strict; use warnings; use Benchmark; my @matchlist = qw( this that these those some other too); my $string = 'here we have a typical sentance which has a match near t +he othr end'; my $bigregex = join '|', map quotemeta($_), @matchlist; my $byhandre = 'th(is|at|ese|ose)|some|other|too'; timethese(shift, { 'Index', 'SimpleIndex', 'm// loop', 'Match', 'BigRE', 'BigRE', 'OptRE', 'ByHand', } ); sub SimpleIndex { for (@matchlist) { return 1 if(index($string, $_) != $[-1 ); } 0; } sub Match { for (@matchlist) { return 1 if( $string =~ /$_/io ); } 0; } sub BigRE { return 1 if( $string =~ /$bigregex/io ); 0; } sub ByHand { return 1 if( $string =~ /$byhandre/io ); 0; }
--
I'd like to be able to assign to an luser
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: regex is too long
by dws (Chancellor) on May 08, 2001 at 22:21 UTC | |
by jhanna (Scribe) on May 09, 2001 at 00:54 UTC |