in reply to Regex failure interpretation

First there's the solution that kvale gave, perhaps the best simplest choice:
/^([01])[01]*$/
but there are some other possibilities using lookahead:
/^(?=(.))[01]+$/
/^(?=[01]+$)(.)/

I feel a benchmark coming up...

use Benchmark 'cmpthese'; cmpthese(-3, { plain => sub { my $x; ($x) = /^([01])[01]*$/ foreach qw(0 1 00 11 1 +0 01 000x00) }, capture_in_lookahead => sub { my $x; ($x) = /^(?=(.))[01]+$/ foreac +h qw(0 1 00 11 10 01 000x00) }, lookahead_then_capture => sub { my $x; ($x) = /^(?=[01]+$)(.)/ fore +ach qw(0 1 00 11 10 01 000x00) }, });
Result:
Benchmark: running capture_in_lookahead, lookahead_then_capture, plain +, each for at least 3 CPU seconds... capture_in_lookahead: 3 wallclock secs ( 3.41 usr + 0.00 sys = 3.41 + CPU) @ 24781.52/s (n=84505) lookahead_then_capture: 4 wallclock secs ( 3.13 usr + 0.00 sys = 3. +13 CPU) @ 24592.33/s (n=76974) plain: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 24 +988.64/s (n=76965) Rate lookahead_then_capture capture_in_looka +head plain lookahead_then_capture 24592/s -- + -1% -2% capture_in_lookahead 24782/s 1% + -- -1% plain 24989/s 2% + 1% --
Well... "plain" appears to be slightly faster, but I wouldn't bother worrying about the difference. Whjat's a few percent, anyway?

Replies are listed 'Best First'.
Re: Re: Regex failure interpretation
by BrowserUk (Patriarch) on Mar 21, 2004 at 01:03 UTC
    ...I wouldn't bother worrying about the difference....

    Funnily enough, I wasn't worrying about efficiency (this time:).


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
Re^2: Regex failure interpretation (noise)
by tye (Sage) on Mar 21, 2004 at 07:12 UTC

    In my experience, differences of 2% when benchmarking are more likely to be noise than anything else.

    I notice that the speed matches the order that the tests were run (first slowest, last fastest). Rename the cases (so that they sort in a different order and so get run in a different order) and you might find a different 'winner'.

    For such tiny differences, just running it again could easily give you a different winner.

    Certainly, running on different platforms is unlikely to always produce the same winner even when the difference is more like 5% or even 15%.

    In future, you might want to have the benchmark run each test twice so you can get a feel for how much of one type of noise you have. For example, for each test case, foo, you give Benchmark a_foo and b_foo that point to the same code.

    Also, your test strings are even shorter than the expected inputs. That right there is often enough to make a benchmark find nothing. And you have a loop in your test code. Processing the loop could be taking more time than running regexes against your tiny strings.

    - tye