in reply to How to match more than 32766 times in regex?

Before I've even read your question; my suggestion is that you take up Python.

Going through a bunch of known limitations, and raising questions about them as if you've newly discovered them, is a sad strategy.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re: How to match more than 32766 times in regex?

Replies are listed 'Best First'.
Re^2: How to match more than 32766 times in regex?
by rsFalse (Chaplain) on Dec 01, 2015 at 18:45 UTC
    I've read about limitation, but is it a way to compose regex without big time penalty? I tried smth like /$regex*$regex*$regex*/ if I wanted match up to 96000 times, but it takes a lot of time regex to finish.

      To make a regexp faster, search from start or end, using ^ or $ to bind it to that point. But can you explain what you want to do? I am sure there is a better way.


      As for code, look at the multiplier x 3 that concatenates the string 3 times. Then, we use qr to quote a regular expression, which we then use and capture the results in @R, which we then print. Hope this gets you ideas. (duplicating the expression to capture it 2 times)

      $ perl -e '$s="(\\d\\w)" x 3; $X="a1b2c3d4e5"; $m=qr/$s/; @R=$X=~$m; +print join(";",@R)."\n"' 1b;2c;3d

      another way could be divide and conquer. Paying a penalty by using $' (the rest of the string that has not matched yet) for the next iteration. another idea is using index

        caveat about the multiplier: It assumes you can match that amount, so if you have 10 patterns to find, but matching 3 at a time, you are unable to match the last one.

        I had a input line of 100k characters '0' or '1'. I tried to solve a problem and find length of alternating subsequence. My approach was
        () = $line =~ /(.)\1*/g
        When I got test-case '0' x 100k, I gain answer of 4, not 1. Because (I think) it found three matches of length 32678 and the rest shorter match.
        When I used
        () = $line =~ /(.)\1*\1*\1*\1*/g
        - it worked slower on test case '01' x 50k. But I can't say how slower, because it was only a part of program (maybe not hot point).
      if I wanted match up to 96000 times,

      What's your application?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
      "... if I wanted match up to 96000 times ..."

      You're fired.