in reply to Regex, capturing variables vs. speed

The three use simpler regexen, so they run faster. A particular drain on your combined regex is the ".*" (dot-star) you use to join the interesting ones. Each will cause the regex to match to the end of the line, then backtrack to find the following fixed text. ".*?" would be a faster joining regex.

See Ovid's Death to Dot Star!

After Compline,
Zaxo

  • Comment on Re: Regex, capturing variables vs. speed

Replies are listed 'Best First'.
Re^2: Regex, capturing variables vs. speed
by sauoq (Abbot) on Oct 30, 2005 at 21:46 UTC
    A particular drain on your combined regex is the ".*" (dot-star) you use to join the interesting ones.

    I wouldn't automatically point fingers at the .* even though, as you say, .*? would probably be better for him. The issue is how his use of .* is combined with his use of .*? when what he really means is \S*. He'd probably see a significant speedup even if switching to \S* was his only change, but incorporating your suggestion makes sense as well.

    So, I'd rewrite

    my ($chr,$prot,$panel) = /(chr.*?)\s.*urn:lsid:(.*?)\s.*panel:(.*?):/i;
    as
    my ($chr, $prot, $panel) = /(chr\S*)\s.*?urn:lsid(\S*)\s.*?panel:([^:] +*):/i;
    (Note that changing the last .*? to [^:]* is a good idea too even if the efficiency gain isn't much.

    Update: Well, reading further down the thread, it seems robin beat me to it.

    -sauoq
    "My two cents aren't worth a dime.";