I have a specialized parser of JavaScript (written in Perl) that often legitimately matches consecutive zero-length strings. As described here, Perl intentionally disallows this, to fix a class of potential infinite-loop bugs.

The normal solution is to set pos() on the input string to reset its zero-length flag. However, in Perls from 5.18 to 5.24, this is very slow when looping through large strings, performing as O(n^2). It may be this bug (my program is a CGI script, thus related to tainting). It's fixed in Perl 5.26, but my program is used in many contexts where the users are not able to upgrade their version of Perl.

My question is: Is there a way other than setting pos() to allow a string to have consecutive zero-length matches? I've tried pos($$in)= pos($$in) and pos($$in)+= 0, but either statement ends up doubling the run time of the whole script. The script is already CPU-intensive, so performance is important here.

(I picture a potential /z regex modifier that allows consecutive zero-length matches.)

Thanks a lot for any suggestions!

UPDATE: It appears to have nothing to do with tainting, but with the UTF-8 flag on the string. Here's a code sample that demonstrates the problem:

#!/usr/bin/perl use strict ; use warnings ; my $st= 'a' x 100000 ; # this program runs as O(n^2) utf8::upgrade($st) ; # ... when operating on a string with the u +tf8 flag set while (1) { $st=~ /\G(?=a)/gc ; pos($st)= pos($st) ; # without this, there's an early exit next +line last unless $st=~ /\G(?=a)/gc ; $st=~ /\Ga/gc ; # increments pos() } print "done\n" ;


In reply to Is there a way to allow consecutive zero-length matches without using pos()? by jsm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.