http://qs1969.pair.com?node_id=1149052

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.
  • Comment on How to match more than 32766 times in regex?

Replies are listed 'Best First'.
Match twice
by choroba (Cardinal) on Dec 01, 2015 at 18:23 UTC
    Update: The whole question is in the title. So does the answer.
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: How to match more than 32766 times in regex?
by BrowserUk (Patriarch) on Dec 01, 2015 at 18:24 UTC

    Before I've even read your question; my suggestion is that you take up Python.

    Going through a bunch of known limitations, and raising questions about them as if you've newly discovered them, is a sad strategy.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I've read about limitation, but is it a way to compose regex without big time penalty? I tried smth like /$regex*$regex*$regex*/ if I wanted match up to 96000 times, but it takes a lot of time regex to finish.

        To make a regexp faster, search from start or end, using ^ or $ to bind it to that point. But can you explain what you want to do? I am sure there is a better way.


        As for code, look at the multiplier x 3 that concatenates the string 3 times. Then, we use qr to quote a regular expression, which we then use and capture the results in @R, which we then print. Hope this gets you ideas. (duplicating the expression to capture it 2 times)

        $ perl -e '$s="(\\d\\w)" x 3; $X="a1b2c3d4e5"; $m=qr/$s/; @R=$X=~$m; +print join(";",@R)."\n"' 1b;2c;3d

        another way could be divide and conquer. Paying a penalty by using $' (the rest of the string that has not matched yet) for the next iteration. another idea is using index

        if I wanted match up to 96000 times,

        What's your application?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        "... if I wanted match up to 96000 times ..."

        You're fired.

Re: How to match more than 32766 times in regex?
by Anonymous Monk on Dec 01, 2015 at 18:54 UTC
    (shrug) Admittedly at this point BrowserUK's suggestion makes sense to me. But, anyway... use a non-backtracking engine. Or change REG_INFTY value in regcomp.h and recompile perl (I have no idea whether it will work or not).
      use strict; use warnings; my $X = "a1b2c3d4e5"; # or use File::Slurp my $s = "(\\w\\d)"; # my pattern match $s my $m = qr/$s/; # compiled to a regular expression $m my $counter = 0; while($X=~s/$m//){ ++$counter; next unless $counter > 32766; # wait for it... print "this is the $counter iteration, got $1 \n"; }

        No need to go to those lengths:

        $s = '0123456789' x 100000;; ( $m ) = $s =~ m[((?:(?:0123456789){32000}){3})];; print length $m;; 960000

        But for any given application there's almost certainly a better way of tackling the problem.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Hmmm, I thought the OP had problems with 'complex regex recursion limit exceeded'. If he just wanted to match something like (\w\d){32767}, sure.
Re: How to match more than 32766 times in regex?
by rsFalse (Chaplain) on Nov 01, 2018 at 11:32 UTC
    perlre: "This is usually 32766 on the most common platforms"

    What do you think about the need to expand the perlre section about quantifiers with the suggestion how to handy overcome '+' and '*' limitation and make the equivalent regex which matches {0,infty}? The regex should be readable and not slow in performance.
      Feel free to send a perlbug with the patch to the documentation.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,