comment on

moritz

I used this script for benchmarking:

Thanks for doing this work already (I started to think about that too;-)

As it looks, the dynamic code assertion *is* somehow slow in this context. I suspected this but it'd be interesting if this disadvantage starts somewhere to disappear eventually.

I 'compacted' your benchmark code a bit and tried to expand the context, eg. to match on all numbers from 0..10,000 which are below 2,567.

This was easily made into the bruteforce and mwah variants but I failed to put this (working) into the 'grinder-sub' (maybe somebody can try). On a larger range, it's clear that the bruteforce-sub approach starts to fail. The beauty of the dynamic code assertion (its slowness will somehow 'disappear' on larger ranges) is the expressiveness which is somehow in contrast to the 'reinvention' of number parser (grinder-sub). Of course, if the problematic range is constant and the developer is able to provide a non-backtracking 'grinder-like' solution, this can't be beaten by anything.

This is my shortened variant (w/grinder defunct) version of your benchmark code:

...
use Benchmark qw(:all);
our $bruteforce_re = 0; # give it the extra preparation bonus

my $grinder = sub {
   my $fails = 0;
   for( 0 .. 10000 ) {
      ++$fails if ! /^(?:2(?:[6-9]|5[0-5]?|[0-4]\d?)?|[3-9]\d?|1(?:\d\
+d?)?|0)$/o
   }
   die "grinder $fails:" if $fails != 10000-2566    # did we get this 
+right?
};

my $mwah = sub {
   my $fails = 0;
   my $re = qr{^(\d+)$(??{$1<=2566?'':'(?!)'})}x;
   for( 0 .. 10000 ) {
      ++$fails if  ! /$re/
   }
   die "mwah $fails:" if $fails != 10000-2566       # did we get this 
+right?
};

my $bruteforce = sub {
   $bruteforce_re = '^(?:' . join('|', 0 .. 2566) . ')$' unless $brute
+force_re;
   my $fails = 0;
   for( 0 .. 10000 ) {
      ++$fails if  ! /$bruteforce_re/o
   }
   die "bruteforce $fails:" if $fails != 10000-2566 # did we get this 
+right?
};

cmpthese(-3, {# grinder     => $grinder,
                mwah        => $mwah,
                brute_force => $bruteforce,
             });
...
[download]

In this range, the dynamic code assertion is already 5 times faster (here) than the bruteforce approach.

Regards

mwa

In reply to Re^3: does code help regex match numeric ranges? by mwah
in thread does code help regex match numeric ranges? by AlwaysLearning

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.