Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Is it safe to use external strings for regexes?

by AnomalousMonk (Archbishop)
on Oct 07, 2021 at 20:38 UTC ( [id://11137315]=note: print w/replies, xml ) Need Help??


in reply to Is it safe to use external strings for regexes?

As a matter of curiosity, I tried some of the classic regexes mentioned in this thread that threaten exponential explosion. They seem to have been tamed long ago. (Same results for version 5.30.3.1.) What are some examples that can still go exponential?

Win8 Strawberry 5.8.9.5 (32) Thu 10/07/2021 16:15:36 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings -l my $futile = 'a' x 10_000; print 'start ', scalar time; die 'huh?' if $futile =~ /a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a +*a*b/ ; print 'post rx 1 ', scalar time; die 'huh?' if $futile =~ /(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:a*)*)*)*)*)*)*)*)*)*)*)*)* +)*b/ ; print 'post rx 2 ', scalar time; print 'done ', scalar time; ^Z start 1633637901 post rx 1 1633637901 post rx 2 1633637901 done 1633637901


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Is it safe to use external strings for regexes?
by LanX (Saint) on Oct 07, 2021 at 20:58 UTC
    My guess: Perl is looking for exact strings from both ends, your regexes include a trailing "b" but your string $futile doesn't

    This simplified demo seems to support my theory

    D:\tmp\pm>perl -Mre=debug -E"'aaaa' =~/a*a*b/" Compiling REx "a*a*b" synthetic stclass "ANYOF[ab]". Final program: 1: STAR (4) 2: EXACT <a> (0) 4: STAR (7) 5: EXACT <a> (0) 7: EXACT <b> (9) 9: END (0) floating "b" at 0..9223372036854775807 (checking floating) stclass ANY +OF[ab] min len 1 Matching REx "a*a*b" against "aaaa" Intuit: trying to determine minimum start position... doing 'check' fbm scan, [0..4] gave -1 Did not find floating substr "b"... Match rejected by optimizer Freeing REx: "a*a*b"

    please note Did not find floating substr "b"... Match rejected by optimizer

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re^2: Is it safe to use external strings for regexes?
by LanX (Saint) on Oct 07, 2021 at 21:53 UTC
    Replace the "exact" b with a character class [bc]

    Then Perl can't rule out the string because of the missing end and you'll see exponential growth.

    use strict; use warnings; $\="\n"; $|=1; redos($_) for 5..8; sub redos { my ($length)=@_; my $futile = 'a' x $length; print "=== length=$length string=$futile"; print 'start ', my $start = scalar time; die 'huh?' if $futile =~ /a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a +*a*a*[bc]/ ; print 'post rx 1 ', time -$start," sec"; die 'huh?' if $futile =~ /(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:(?:a*)*)*)*)*)*)*)*)*)*)*)* +)*)*[bc]/ ; print 'post rx 2 ', time -$start," sec"; print 'done ', time -$start," sec"; print "\n" x2; }

    C:/Strawberry/perl/bin\perl.exe -w d:/tmp/pm/redos.pl === length=5 string=aaaaa start 1633643412 post rx 1 0 sec post rx 2 0 sec done 0 sec === length=6 string=aaaaaa start 1633643412 post rx 1 1 sec post rx 2 1 sec done 1 sec === length=7 string=aaaaaaa start 1633643413 post rx 1 4 sec post rx 2 4 sec done 4 sec === length=8 string=aaaaaaaa start 1633643417 post rx 1 20 sec post rx 2 20 sec done 20 sec

    it's the first regex which is obviously growing in an exponential manner...

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re^2: Is it safe to use external strings for regexes? (infinite loops)
by LanX (Saint) on Oct 11, 2021 at 12:31 UTC
    Hi

    I just stumbled over an example for something even worse:

    infinite loops

    perlre#Repeated Patterns Matching a Zero-length Substring

    > A common abuse of this power stems from the ability to make infinite loops using regular expressions, with something as innocuous as:

    > "foo" =~ m{ ( o? )* }x;

    > The o? matches at the beginning of "foo", and since the position in the string is not moved by the match, o? would match again and again because of the "*" quantifier. Another common way to create a similar cycle is with the looping modifier /g:

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Huh?
      $ perl -wE '$f = "foo"; say pos $f while $f =~ m{ ( o? )* }gx;' 0 3 3

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        dunno, I just RTFM and copied it here! :)

        maybe some delta fixed it?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      The section to which you linked goes on to say

      Thus Perl allows such constructs, by forcefully breaking the infinite loop.


      Give a man a fish:  <%-{-{-{-<

        Update: Oops... I meant this post as a reply to this post, not to myself, but that's ok, no need to re-parent. :)


        As you have, I think, suggested elsewhere and as the documentation itself cheerfully admits, a rewrite of this section would be welcome.

        That the section begins with a discussion of the evils of zero-width match infinite loops accompanied by a bunch of Perl code examples of such matches that don't actually "work" (in the sense that they don't produce infinite loops) is not helpful. The discussion finally gets around to saying that the Perl RE does not, in fact, allow such loops, but by then one may have been led far down the garden path and abandoned in the dark forest.


        Give a man a fish:  <%-{-{-{-<

        > Thus Perl allows such constructs, by forcefully breaking the infinite loop.

        Thanks, that wasn't clear to me.

        BUT I should have taken more care about the

        > WARNING: Difficult material (and prose) ahead. This section needs a rewrite.

        FWIW, the forced break can be seen with re 'debug'

        D:\tmp>perl -Mre=debug -e"'foo' =~ m{ ( o? )* }x;" Compiling REx " ( o? )* " Final program: 1: CURLYX[0]{0,INFTY} (12) 3: OPEN1 (5) 5: CURLY{0,1} (9) 7: EXACT <o> (0) 9: CLOSE1 (11) 11: WHILEM[1/1] (0) 12: NOTHING (13) 13: END (0) minlen 0 Matching REx " ( o? )* " against "foo" 0 <> <foo> | 0| 1:CURLYX[0]{0,INFTY}(12) 0 <> <foo> | 1| 11:WHILEM[1/1](0) | 1| WHILEM: matched 0 out of 0..65535 0 <> <foo> | 2| 3:OPEN1(5) 0 <> <foo> | 2| 5:CURLY{0,1}(9) | 2| EXACT <o> can match 0 times out +of 1... 0 <> <foo> | 3| 9:CLOSE1(11) 0 <> <foo> | 3| 11:WHILEM[1/1](0) | 3| WHILEM: matched 1 out of 0..655 +35 | 3| WHILEM: empty match detected, t +rying continuation... <---- HERE + 0 <> <foo> | 4| 12:NOTHING(13) 0 <> <foo> | 4| 13:END(0) Match successful! Freeing REx: " ( o? )* " D:\tmp>

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137315]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-03-28 12:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found