I have what seems like a simple parsing problem, but am stumped. I have single lines in input with repeated patterns of arbitrary length that match this general format (whown with 3 patterns):

$str = q!AND (random text) AND (more random text) AND (yet more)!; $str = q!OR (random text) OR (more random text) OR (yet more)!;
I want to build a parser that loops through the input line and repeatedly extracts either the token "AND" or "OR", and then eats up everything in the line until the next "AND" or "OR", or EOL whichever comes first. I've tried this:
while ($str =~ /\G(AND|OR)\s+(.+?)/g) { printf("%s %s\n", $1, $2); }
but that doesn't work -- the 2nd pattern eats everything to the EOL. Since the 2nd pattern is bounded by parens, I've tried this:
while ($str =~ /\G(AND|OR)\s+(\(.+?\))/g) { printf("%s %s\n", $1, $2); }
This does work, however, if the "random text" includes any parens itself, then the pattern match fails, by matching the first end-parenthesis in the text. E.g. this string breaks the pattern match:

$str = q!AND (random text) AND (yet (more))!;

What's a good way to eat up the line, capturing the "AND" or "OR" token in $1, and the "random text" in $2, on and on until we hit the EOL. Your regex expertise is appreciated.


In reply to simple perl regex question (or is it?) by cadphile

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.