Re: Regex failure interpretation

First there's the solution that kvale gave, perhaps the ~~best~~ simplest choice:

/^([01])[01]*$/
[download]

but there are some other possibilities using lookahead:

/^(?=(.))[01]+$/
[download]

/^(?=[01]+$)(.)/
[download]

I feel a benchmark coming up...

use Benchmark 'cmpthese';
cmpthese(-3, {
   plain => sub { my $x; ($x) = /^([01])[01]*$/ foreach qw(0 1 00 11 1
+0 01 000x00) },
   capture_in_lookahead => sub { my $x; ($x) = /^(?=(.))[01]+$/ foreac
+h qw(0 1 00 11 10 01 000x00) },
   lookahead_then_capture => sub { my $x; ($x) = /^(?=[01]+$)(.)/ fore
+ach qw(0 1 00 11 10 01 000x00) },
});
[download]

Result:

Benchmark: running capture_in_lookahead, lookahead_then_capture, plain
+, each for at least 3 CPU seconds...
capture_in_lookahead:  3 wallclock secs ( 3.41 usr +  0.00 sys =  3.41
+ CPU) @ 24781.52/s (n=84505)
lookahead_then_capture:  4 wallclock secs ( 3.13 usr +  0.00 sys =  3.
+13 CPU) @ 24592.33/s (n=76974)
     plain:  3 wallclock secs ( 3.08 usr +  0.00 sys =  3.08 CPU) @ 24
+988.64/s (n=76965)
                          Rate lookahead_then_capture capture_in_looka
+head plain
lookahead_then_capture 24592/s                     --                 
+ -1%   -2%
capture_in_lookahead   24782/s                     1%                 
+  --   -1%
plain                  24989/s                     2%                 
+  1%    --
[download]

Well... "plain" appears to be slightly faster, but I wouldn't bother worrying about the difference. Whjat's a few percent, anyway?

Comment on Re: Regex failure interpretation Select or Download Code

Replies are listed 'Best First'.
Re: Re: Regex failure interpretation by BrowserUk (Patriarch) on Mar 21, 2004 at 01:03 UTC
...I wouldn't bother worrying about the difference.... Funnily enough, I wasn't worrying about efficiency (this time:). Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re^2: Regex failure interpretation (noise) by tye (Sage) on Mar 21, 2004 at 07:12 UTC
In my experience, differences of 2% when benchmarking are more likely to be noise than anything else. I notice that the speed matches the order that the tests were run (first slowest, last fastest). Rename the cases (so that they sort in a different order and so get run in a different order) and you might find a different 'winner'. For such tiny differences, just running it again could easily give you a different winner. Certainly, running on different platforms is unlikely to always produce the same winner even when the difference is more like 5% or even 15%. In future, you might want to have the benchmark run each test twice so you can get a feel for how much of one type of noise you have. For example, for each test case, foo, you give Benchmark a_foo and b_foo that point to the same code. Also, your test strings are even shorter than the expected inputs. That right there is often enough to make a benchmark find nothing. And you have a loop in your test code. Processing the loop could be taking more time than running regexes against your tiny strings. - tye	[reply]