I have constructed a regex which is far more readable, and does the job (on some simple test cases from your post).

It breaks the regex into three parts: single-quoted strings, double-quoted strings, and all others. The single- and double-quoted string parts are very similar. The logic used is:

If that's not possible, then we use the other part. This is a lengthy post, so...

$REx = qr{ ' (?> [^'\\?]* ) (?: (?: (?: \\ | \?\?/ ) . | \?\?' | \? (?! \? ['/] ) ) (?> [^'\\?]* ) )* ' | " (?> [^"\\?]* ) (?: (?: (?: \\ | \?\?/ ) . | \?\?' | \? (?! \? ['/] ) ) (?> [^"\?]* ) )* " | (?: (?! / [/*] ) (?: \?\?['/] | \? (?! \? ['/] ) | (?> [^?'"\s]+ ) ) )+ }x;
And here's the explain output:
(?x-ims: # group, but do not capture (disregarding # whitespace and comments) (case-sensitive) # (with ^ and $ matching normally) (with . not # matching \n): ' # '\'' (?> # match (and do not backtrack afterwards): [^'\\?]* # any character except: ''', '\\', '?' (0 # or more times (matching the most amount # possible)) ) # end of look-ahead (?x: # group, but do not capture (0 or more times # (matching the most amount possible)): (?x: # group, but do not capture: (?x: # group, but do not capture: \\ # '\' | # OR \? # '?' \? # '?' / # '/' ) # end of grouping . # any character except \n | # OR \? # '?' \? # '?' ' # '\'' | # OR \? # '?' (?! # look ahead to see if there is not: \? # '?' ['/] # any character of: ''', '/' ) # end of look-ahead ) # end of grouping (?> # match (and do not backtrack afterwards): [^'\\?]* # any character except: ''', '\\', '?' # (0 or more times (matching the most # amount possible)) ) # end of look-ahead )* # end of grouping ' # '\'' | # OR " # '"' (?> # match (and do not backtrack afterwards): [^"\\?]* # any character except: '"', '\\', '?' (0 # or more times (matching the most amount # possible)) ) # end of look-ahead (?x: # group, but do not capture (0 or more times # (matching the most amount possible)): (?x: # group, but do not capture: (?x: # group, but do not capture: \\ # '\' | # OR \? # '?' \? # '?' / # '/' ) # end of grouping . # any character except \n | # OR \? # '?' \? # '?' ' # '\'' | # OR \? # '?' (?! # look ahead to see if there is not: \? # '?' ['/] # any character of: ''', '/' ) # end of look-ahead ) # end of grouping (?> # match (and do not backtrack afterwards): [^"\?]* # any character except: '"', '\?' (0 or # more times (matching the most amount # possible)) ) # end of look-ahead )* # end of grouping " # '"' | # OR (?x: # group, but do not capture (1 or more times # (matching the most amount possible)): (?! # look ahead to see if there is not: / # '/' [/*] # any character of: '/', '*' ) # end of look-ahead (?x: # group, but do not capture: \? # '?' \? # '?' ['/] # any character of: ''', '/' | # OR \? # '?' (?! # look ahead to see if there is not: \? # '?' ['/] # any character of ''', '/' ) # end of look-ahead | # OR (?> # match (and do not backtrack # afterwards): [^?'"\s]+ # any character except: '?', ''', '"', # whitespace (\n, \r, \t, \f, and " ") # (1 or more times (matching the most # amount possible)) ) # end of look-ahead ) # end of grouping )+ # end of grouping ) # end of grouping


japhy -- Perl and Regex Hacker

In reply to Re: really large regex misbehaving - WTF by japhy
in thread really large regex misbehaving by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.