You might be able to get some useful ideas by looking at how RE (tilly) 4: SAS log scanner works and applying that to your matching problem.

The basic reference for understanding performance issues with REs is Mastering Regular Expressions by Friedl. The key is that the way that Perl's RE engine works is that it does a recursive search through ways of partially matching your pattern to the string, every time it fails backing up to the last choice it had and trying again. The problem is that the tree that it has to search can easily be exponential in the size of the string you are trying to match. The pattern /(foo)(.*)*\1\2/ on the string "foo and lots of garbage here..." is a canonical example. Of course Perl works hard to recognize the simple examples of this issue and special cases them so that won't be slow.

The general answer is therefore to write your RE in such a way that the RE makes as few choices as possible, and so that where it does make choices, it makes them as late as possible so that you spend less time backtracking then redoing the same reasoning as possible. In this specific case the best you can do is watch the RE match, and as it goes, trace on pen and paper exactly what choices it is making where. You already know that the RE engine is backtracking, but trying to reason along with it should show you why it is backtracking. (The hard part of the exercise is that you will tend to leap to general conclusions that you know are right, and miss that the computer is going to grind away missing the obvious at great length...)


In reply to Re (tilly) 2: Regex runs for too long by tilly
in thread Regex runs for too long by examine

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.