in reply to Removing backtracking from a .*? regexp

How about the following?

use strict; my $target = shift || 'word'; my $re = qr/a=<(.*?$target.*?)>/; while( <DATA> ) { if (index($_, $target) >= 0 && /$re/) { #short-circuit regex! print "[$1]\n"; } }

The poor, oft-ignored index function will filter out lines that don't contain 'word' at all, probably w/fewer cycles than running a regexp. Your mileage may vary, of course.

Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"

Replies are listed 'Best First'.
Re: Re: Removing backtracking from a .*? regexp
by diotalevi (Canon) on Nov 17, 2003 at 20:44 UTC
    The index() isn't really all that much faster than a regexp. BrowserUK was testing this once, I played with the benchmark and noticed that the index test was a percentage point faster. At this point I'd prefer people just recommend the regexp since it is as fast and is notationally less complex.

      Cool beans if so! Sounds like a good candidate for a Meditation.

      Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"

        No meditation necessary.

        print 'Found it' if $s =~ /$thing/; print 'Found it' if 1+index( $s, $thing );

        Both work. The latter is marginally quicker under most circumstances, but as you can see it is notationally more complex.

        If you find the additional complexity intimidating, or don't need the (very) marginal additional performance, stick with the former.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!
        Wanted!