It looks like you are using a  s/// substitution to repeatedly search from the very start of the string and then snip out already-processed substrings so that you don't encounter them again. It would be so much easier (and faster, if the string/file is huge) to use the  /g modifier on a  m// match and deal with each sub-string as it is found. See Modifiers in perlre, also see perlretut, perlrequick.

A little whitespace and formatting never hurts, either. See the  /x modifier in the references above.

Another suggestion is to factor out sub-patterns with a collection of  qr// regex object definitions (see references above). As with code in general, such factoring allows you to better understand and control the final regex. An example of such factoring is in the code of my original reply.

OTOH, since it looks like you may be trying to parse XML, the best advice might be to not use regexes at all; use one of the many fine XML parser modules from CPAN: see XML::Parse (others will be better able than I to advise you on this).


In reply to Re^3: pattern matching (greedy, non-greedy,...) by AnomalousMonk
in thread pattern matching (greedy, non-greedy,...) by cacophony777

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.