I'm assuming the two regexes really are:
slow:
    /(\S)[\.-]*(\S)[\.-]*(\S)[\.-]*(\S)[\.-]*$/s
and fast:
    /.*(\w)[\.-]*(\w)[\.-]*(\w)[\.-]*(\w)[\.-]*/s

After staring for a while at the string you're trying to match against, I'm also assuming that it has no spaces in it, so  \S (a non-whitespace character) matches any single character in the string.

Here's my intepretation of the slow regex:
    /(\S)[\.-]*(\S)[\.-]*(\S)[\.-]*(\S)[\.-]*$/s
The regex is anchored at the end of the string, but the regex engine (RE) looks for the leftmost, longest match, so it has to start looking at the beginning of the string. It looks for anything, zero or more dots or dashes, anything, zero or more dots or dashes, anything, zero or more dots or dashes, anything, zero or more dots or dashes, and, having found a substring matching this, it looks for the end of the string. If the RE fails to find the end of the string at that point, it starts to backtrack: it gives up every possible variation of dots, dashes or anything and then checks again that the end of the string is at the end of this variation. The RE continues to do this until it exhausts every possible variation that can be extracted from the original substring because none of these, I'm guessing, will also match with the  $ end-of-string assertion. Then the RE advances the 'match point' by one character position to the second character in the string and tries the whole thing over again. You have entered Backtrack Hell; you are lucky it only takes you a few hours to get out of it.

Likewise, the fast regex:
    /.*(\w)[\.-]*(\w)[\.-]*(\w)[\.-]*(\w)[\.-]*/s
The first thing to note is that you are matching  \w (a 'word' character) instead of  \S (effectively, anything), and that the characters matched by  \w and  [\.-] are mutually exclusive. This is an enormous help to the poor RE. The other thing to note is that the regex begins with a  .* that immediately consumes every character to the end of the string before the RE begins backtracking from the end to try to find a pattern not anchored at the end of the string: another huge help. After a reasonable amount of backtracking, a match is found (if my visual inspection of the string is correct).

A small point: Since the  . metacharacter ('any character') is not used in the regex, the  //s regex modifier has no effect, although it does no harm.


In reply to Re: why this regular expression is so slow? by AnomalousMonk
in thread why this regular expression is so slow? by fnever

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.