does code help regex match numeric ranges?

AlwaysLearning has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: does code help regex match numeric ranges? by grinder (Bishop) on Nov 04, 2007 at 10:02 UTC
You really didn't want to know this, but if you want to match 0-255 in a regexp, you could do worse than use a pattern that involves no backtracking... `^(?:2(?:[6-9]\|5[0-5]?\|[0-4]\d?)?\|[3-9]\d?\|1(?:\d\d?)?\|0)$` • another intruder with the mooring in the heart of the Perl	[reply] [d/l]
Re^2: does code help regex match numeric ranges? by moritz (Cardinal) on Nov 04, 2007 at 10:28 UTC
I did some benchmarking of this solution, comparing it to a dumb `join '\|', 0 .. 255`. Update: and benchmarked mwah's solution as well. The non-backtracking solution really pays off: `Rate brute_force mwah grinder brute_force 125/s -- -25% -92% mwah 167/s 33% -- -89% grinder 1520/s 1114% 812% --` [download] However demerphq++'s Trie optimization (I hope I remembered the name correctly) in perl5.10 reduced that advantage significantly - so far, that it becomes faster than the code assertions: `Rate mwah brute_force grinder mwah 156/s -- -32% -87% brute_force 229/s 47% -- -81% grinder 1181/s 659% 416% --` [download] Removing the `/o`-flags from the regexes makes the gap a bit wider. Read more... (2 kB)	[reply] [d/l] [select]
Re^3: does code help regex match numeric ranges? by mwah (Hermit) on Nov 04, 2007 at 13:10 UTC
moritz I used this script for benchmarking: Thanks for doing this work already (I started to think about that too;-) As it looks, the dynamic code assertion is somehow slow in this context. I suspected this but it'd be interesting if this disadvantage starts somewhere to disappear eventually. I 'compacted' your benchmark code a bit and tried to expand the context, eg. to match on all numbers from 0..10,000 which are below 2,567. This was easily made into the bruteforce and mwah variants but I failed to put this (working) into the 'grinder-sub' (maybe somebody can try). On a larger range, it's clear that the bruteforce-sub approach starts to fail. The beauty of the dynamic code assertion (its slowness will somehow 'disappear' on larger ranges) is the expressiveness which is somehow in contrast to the 'reinvention' of number parser (grinder-sub). Of course, if the problematic range is constant and the developer is able to provide a non-backtracking 'grinder-like' solution, this can't be beaten by anything. This is my shortened variant (w/grinder defunct) version of your benchmark code: ... use Benchmark qw(:all); our $bruteforce_re = 0; # give it the extra preparation bonus my $grinder = sub { my $fails = 0; for( 0 .. 10000 ) { ++$fails if ! /^(?:2(?:[6-9]\|5[0-5]?\|[0-4]\d?)?\|[3-9]\d?\|1(?:\d\ +d?)?\|0)$/o } die "grinder $fails:" if $fails != 10000-2566 # did we get this +right? }; my $mwah = sub { my $fails = 0; my $re = qr{^(\d+)$(??{$1<=2566?'':'(?!)'})}x; for( 0 .. 10000 ) { ++$fails if ! /$re/ } die "mwah $fails:" if $fails != 10000-2566 # did we get this +right? }; my $bruteforce = sub { $bruteforce_re = '^(?:' . join('\|', 0 .. 2566) . ')$' unless $brute +force_re; my $fails = 0; for( 0 .. 10000 ) { ++$fails if ! /$bruteforce_re/o } die "bruteforce $fails:" if $fails != 10000-2566 # did we get this +right? }; cmpthese(-3, {# grinder => $grinder, mwah => $mwah, brute_force => $bruteforce, }); ... [download] In this range, the dynamic code assertion is already 5 times faster (here) than the bruteforce approach. Regards mwa	[reply] [d/l]
Re^2: does code help regex match numeric ranges? by Krambambuli (Curate) on Nov 04, 2007 at 12:21 UTC
In production code, it might be worth expanding grinder's regexp to use warnings; use strict; my @test_values = qw/ 26 231 232 233 234 254 255 256 025 0000 /; my $rx = qr /^ (?: 2 # first digit is a 2 (?: [6-9] # second and _last_ digit is + 6-9 \| 5 # or second digit 5 [0-5]? # maybe third and _last_ dig +it 0-5 \| [0-4] # or second digit 0-4 \d? # maybe followed by a final +third 0-9 )? \| [3-9] # first digit 3-9 \d? # maybe followed by final 0- +9 \| 1 # first digit 1 (?: \d\d? # maybe followed by 1 or 2 digi +ts )? \| 0 # single-digit 0 ) $/x; foreach my $number ( @test_values ) { print "$number:", ($number =~ /$rx/ ? 'true' : 'false'), "\n", ; } [download] Which makes it easier to see that leading zeroes, as in 0000 or 025, wouldn't fit. That might be or not a problem, of course. Krambambuli --- enjoying Mark Jason Dominus' Higher-Order Perl	[reply] [d/l]
Re: does code help regex match numeric ranges? by moritz (Cardinal) on Nov 04, 2007 at 07:55 UTC
In case there's no better way, you can always set some variable that is visible to the outer scope to store capturing groups and to indicate failure. perlre says This assertion may be used as a "(?(condition)yes-pat-tern\|no-pattern)" switch which makes me believe it should be possible let the regex fail (use `(?!)` as the no-pattern). I haven't managed it yet, but I'm trying...	[reply] [d/l]
Re: does code help regex match numeric ranges? by mwah (Hermit) on Nov 04, 2007 at 11:37 UTC
AlwaysLearning Is there a way to get the regex to fail in the cases where true is not printed? This looks simple at the first moment (maybe I didn't correctly understand the task you intended). You could generate a regular expression 'on thy fly' depending on the captured value. `$_ = 254; my $r = qr{ ^ # regex bound to start of line ((?>\d+)) # what to capture, don't backtrack: ( +?> ) (??{ $1<255 && $1>233 # what is looked for ? '' # if yes, let the regex succeed : '(?!)' # if no, let the regex bail }) }x; print "true ($1)" if /$r/;` [download] You can't "modify" the regex outcome from within a simple code assertion ?{}, you'll need to use the dynamic regex assertion ??{} for that. Regards mwa	[reply] [d/l]
Re^2: does code help regex match numeric ranges? by AlwaysLearning (Sexton) on Nov 06, 2007 at 20:44 UTC
Thanks, this is what I was looking for, something in which the numbers (255 and 233) could be changed by a human without reworking the whole pattern, and spending a not insignificant time validating that it worked correctly. Too bad it is so long, but better that, in this case, than making changing it difficult. Of course, there are other conditions that will surround the numbers, that are nicely done in regex form, to select which numbers get evaluated.	[reply]
Re: does code help regex match numeric ranges? by rminner (Chaplain) on Nov 04, 2007 at 11:08 UTC
`Is there a way to get the regex to fail in the cases where true is not printed?` According to perlre it isn't possible: `(?{ code }) ... This zero-width assertion evaluates any embedded Perl code. It always +succeeds, and its code is not interpolated.` [download] In other words you can't get the regex to fail using regex subs. You can obviously store the result of the sub in an var in the outer scope and evaluate it afterwards (just like moritz said). But the regex itself can't fail by design (when a code block is used). If you don't want to use the regex solution, you might consider "hiding" it by using Regexp::Common. In your case probably Regexp::Common::net. (You are still using regex, but it's less obvious and less error prone).	[reply] [d/l] [select]