in reply to Did the inefficiency of /i get fixed?

All these posts have used regexes that are too simple. Here's my benchmark (on 5.8.3):
use Benchmark 'cmpthese'; my $y = "HELLO world" x 100; cmpthese(-5, { cc_simple => sub { $y =~ /[hH][eE][lL][lL][oO]/; }, cc_complex => sub { $y =~ /\w+ [hH][eE][lL][lL][oO] \w+/; }, i_simple => sub { $y =~ /hello/i; }, i_complex => sub { $y =~ /\w+ hello \w+/i; }, simple => sub { $y =~ /HELLO/; }, complex => sub { $y =~ /\w+ HELLO \w+/; }, simple_fail => sub { $y =~ /hELLO/; }, complex_fail => sub { $y =~ /\w+ hELLO \w+/; }, }); __END__ Rate cc_complex i_complex complex complex_fail simpl +e_fail cc_simple i_simple simple cc_complex 13973/s -- -2% -82% -83% + -87% -99% -99% -99% i_complex 14271/s 2% -- -81% -83% + -87% -99% -99% -99% complex 77103/s 452% 440% -- -7% + -31% -93% -93% -94% complex_fail 82560/s 491% 479% 7% -- + -26% -92% -93% -94% simple_fail 111335/s 697% 680% 44% 35% + -- -90% -90% -92% cc_simple 1065985/s 7529% 7370% 1283% 1191% + 857% -- -5% -23% i_simple 1120558/s 7919% 7752% 1353% 1257% + 906% 5% -- -19% simple 1377591/s 9759% 9553% 1687% 1569% + 1137% 29% 23% --
You can see that, for the very simple regex, the /i regex is faster than the charclass regex -- but this is because, even with case-sensitivity on, the regex /hello/ is a simple Boyer-Moore search (compare "simple" and "i_simple"). Once you get into a complex regex -- one that requires actual pattern matching -- you can see that /i and charclasses operate about the same (compare "cc_complex" and "i_complex").
_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re: Re: Did the inefficiency of /i get fixed?
by itub (Priest) on May 19, 2004 at 14:22 UTC

    The inefficiency is true and very important if you are using Unicode strings, because lowercasing Unicode characters is slow. Consider this modified benchmark:

    use utf8; use Benchmark 'cmpthese'; my $y = "HELLO ΑΙΝΣΪ" x 100; # make sure your script is encoded in UTF-8 when you save it! # ... the rest of the code is the same as in the parent node

    Results:

                     Rate i_complex cc_complex complex complex_fail i_simple simple_fail simple cc_simple
    i_complex       672/s        --       -68%    -70%         -71%     -98%       -100%  -100%     -100%
    cc_complex     2078/s      209%         --     -9%         -10%     -94%        -99%  -100%     -100%
    complex        2278/s      239%        10%      --          -2%     -94%        -99%  -100%     -100%
    complex_fail   2317/s      245%        12%      2%           --     -94%        -99%  -100%     -100%
    i_simple      35860/s     5234%      1626%   1474%        1448%       --        -79%   -95%      -95%
    simple_fail  168298/s    24935%      8001%   7289%        7164%     369%          --   -76%      -78%
    simple       703114/s   104489%     33744%  30771%       30248%    1861%        318%     --      -10%
    cc_simple    780570/s   116011%     37472%  34172%       33591%    2077%        364%    11%        --
    

    Character classes are 3 times faster than /i for the complex case and 21 times faster for the simple case!