Re^2: Is it safe to use external strings for regexes?
by LanX (Saint) on Oct 07, 2021 at 20:58 UTC
|
D:\tmp\pm>perl -Mre=debug -E"'aaaa' =~/a*a*b/"
Compiling REx "a*a*b"
synthetic stclass "ANYOF[ab]".
Final program:
1: STAR (4)
2: EXACT <a> (0)
4: STAR (7)
5: EXACT <a> (0)
7: EXACT <b> (9)
9: END (0)
floating "b" at 0..9223372036854775807 (checking floating) stclass ANY
+OF[ab] min
len 1
Matching REx "a*a*b" against "aaaa"
Intuit: trying to determine minimum start position...
doing 'check' fbm scan, [0..4] gave -1
Did not find floating substr "b"...
Match rejected by optimizer
Freeing REx: "a*a*b"
please note Did not find floating substr "b"... Match rejected by optimizer
| [reply] [Watch: Dir/Any] [d/l] |
Re^2: Is it safe to use external strings for regexes?
by LanX (Saint) on Oct 07, 2021 at 21:53 UTC
|
use strict;
use warnings;
$\="\n";
$|=1;
redos($_) for 5..8;
sub redos {
my ($length)=@_;
my $futile = 'a' x $length;
print "=== length=$length string=$futile";
print 'start ', my $start = scalar time;
die 'huh?' if $futile =~
/a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
+*a*a*[bc]/
;
print 'post rx 1 ', time -$start," sec";
die 'huh?' if $futile =~
/(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:a*)*)*)*)*)*)*)*)*)*)*)*
+)*)*[bc]/
;
print 'post rx 2 ', time -$start," sec";
print 'done ', time -$start," sec";
print "\n" x2;
}
C:/Strawberry/perl/bin\perl.exe -w d:/tmp/pm/redos.pl
=== length=5 string=aaaaa
start 1633643412
post rx 1 0 sec
post rx 2 0 sec
done 0 sec
=== length=6 string=aaaaaa
start 1633643412
post rx 1 1 sec
post rx 2 1 sec
done 1 sec
=== length=7 string=aaaaaaa
start 1633643413
post rx 1 4 sec
post rx 2 4 sec
done 4 sec
=== length=8 string=aaaaaaaa
start 1633643417
post rx 1 20 sec
post rx 2 20 sec
done 20 sec
it's the first regex which is obviously growing in an exponential manner...
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re^2: Is it safe to use external strings for regexes? (infinite loops)
by LanX (Saint) on Oct 11, 2021 at 12:31 UTC
|
Hi
I just stumbled over an example for something even worse:
infinite loops
perlre#Repeated Patterns Matching a Zero-length Substring
> A common abuse of this power stems from the ability to make infinite loops using regular expressions, with something as innocuous as:
> "foo" =~ m{ ( o? )* }x;
> The o? matches at the beginning of "foo", and since the position in the string is not moved by the match, o? would match again and again because of the "*" quantifier. Another common way to create a similar cycle is with the looping modifier /g:
| [reply] [Watch: Dir/Any] [d/l] |
|
$ perl -wE '$f = "foo"; say pos $f while $f =~ m{ ( o? )* }gx;'
0
3
3
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
|
|
|
|
| [reply] [Watch: Dir/Any] [d/l] |
|
Update: Oops... I meant this post as a reply to this post, not to myself, but that's ok, no need to re-parent. :)
As you have, I think, suggested elsewhere and as the documentation itself cheerfully admits, a rewrite of this section would be welcome.
That the section begins with a discussion of the evils of zero-width match infinite loops accompanied by a bunch of Perl code examples of such matches that don't actually "work" (in the sense that they don't produce infinite loops) is not helpful. The discussion finally gets around to saying that the Perl RE does not, in fact, allow such loops, but by then one may have been led far down the garden path and abandoned in the dark forest.
Give a man a fish: <%-{-{-{-<
| [reply] [Watch: Dir/Any] [d/l] |
|
|
|
> Thus Perl allows such constructs, by forcefully breaking the infinite loop.
Thanks, that wasn't clear to me.
BUT I should have taken more care about the
> WARNING: Difficult material (and prose) ahead. This section needs a rewrite.
FWIW, the forced break can be seen with re 'debug'
D:\tmp>perl -Mre=debug -e"'foo' =~ m{ ( o? )* }x;"
Compiling REx " ( o? )* "
Final program:
1: CURLYX[0]{0,INFTY} (12)
3: OPEN1 (5)
5: CURLY{0,1} (9)
7: EXACT <o> (0)
9: CLOSE1 (11)
11: WHILEM[1/1] (0)
12: NOTHING (13)
13: END (0)
minlen 0
Matching REx " ( o? )* " against "foo"
0 <> <foo> | 0| 1:CURLYX[0]{0,INFTY}(12)
0 <> <foo> | 1| 11:WHILEM[1/1](0)
| 1| WHILEM: matched 0 out of 0..65535
0 <> <foo> | 2| 3:OPEN1(5)
0 <> <foo> | 2| 5:CURLY{0,1}(9)
| 2| EXACT <o> can match 0 times out
+of 1...
0 <> <foo> | 3| 9:CLOSE1(11)
0 <> <foo> | 3| 11:WHILEM[1/1](0)
| 3| WHILEM: matched 1 out of 0..655
+35
| 3| WHILEM: empty match detected, t
+rying continuation... <---- HERE
+
0 <> <foo> | 4| 12:NOTHING(13)
0 <> <foo> | 4| 13:END(0)
Match successful!
Freeing REx: " ( o? )* "
D:\tmp>
| [reply] [Watch: Dir/Any] [d/l] [select] |