Re^4: Performance optimization question

I *think* that a regex will apply Boyer-Moore automatically, whereas index will only apply it if you study the string. That's how I interpret these results:

for( 30, 300, 3000, 30000 ) { 
    $s = 'x'x$_ . 'reg exp' . 'x'x$_; 
    cmpthese -1, { 
        REGEX => q[ $x = $s =~ /reg exp/;], 
        INDEX => q[ $x = index $s, 'reg exp';] 
    };
};;

           Rate INDEX REGEX
INDEX 1956298/s    --  -19%
REGEX 2425347/s   24%    --

          Rate INDEX REGEX
INDEX 340657/s    --  -65%
REGEX 973582/s  186%    --

          Rate INDEX REGEX
INDEX  40348/s    --  -74%
REGEX 155495/s  285%    --

         Rate INDEX REGEX
INDEX  3530/s    --  -77%
REGEX 15077/s  327%    --
[download]

With study

for( 30, 300, 3000, 30000 ) { 
    $s = 'x'x$_ . 'reg exp' . 'x'x$_; 
    study $s; 
    cmpthese -1, { 
        REGEX => q[ $x = $s =~ /reg exp/;], 
        INDEX => q[$x = index $s, 'reg exp';] 
    } 
};;
           Rate REGEX INDEX
REGEX 2101154/s    --  -28%
INDEX 2912801/s   39%    --

           Rate REGEX INDEX
REGEX  829929/s    --  -32%
INDEX 1224656/s   48%    --

          Rate REGEX INDEX
REGEX 100781/s    --  -38%
INDEX 161881/s   61%    --

         Rate REGEX INDEX
REGEX 10983/s    --  -26%
INDEX 14792/s   35%    --
[download]

Once you study the string, index wins out (marginally as the length of the string increases), which I attbribute to the saving of not having to invoke the regex engine. The saving decreases with length as the regex startup costs are amortised over the greater length.

Without the study, index does badly because it does a dumb search rather than an intelligent one. That makes regex get quicker with length.

But don't be surprised if dave_the_m or demerphq pop by and point out that I'm misinterpreting.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Comment on Re^4: Performance optimization question Select or Download Code