Slowness of /i

sulfericacid has asked for the wisdom of the Perl Monks concerning the following question:

A while back I was told that matching using /i is actually rather slow and now I'm wondering why this is.

Is it literally making every possible CaPs CoMBiNatIoN for all retrievable data? To me this sounds rather costly and would explain why case insensitivity would be slower. But this also doesn't sound like typical behavoir. I'd imagine it would instead automagically produce all lowercase data (and a lowercase match string).

Now if this is what Perl really does, why would it be slow? Just kind of curious how case insensitivity really works and why it's costly.

Thanks wise monks.

"Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

sulfericacid

Comment on Slowness of /i

Replies are listed 'Best First'.

Re: Slowness of /i
by Zaxo (Archbishop) on May 01, 2005 at 18:22 UTC

It's not as bad as generating all combinations. Roughly speaking, the regex engine only needs to compare upper- and lower-case versions of the next character in the pattern, and it gets to move on as soon as it finds a match.

As you suggest, if speed matters it's best to apply lc to the searched text and use a lower-cased regex pattern.

After Compline,
Zaxo

[reply]

Re^2: Slowness of /i

by tlm (Prior) on May 01, 2005 at 19:13 UTC

Actually, pre-applying lc did not fare so well:

use strict;
use warnings;
use Benchmark 'cmpthese';

my $string = 'TwAs BrIlLiG aNd ThE sLiThY tOvEs DiD gYrE aNd GiMbLe';

cmpthese( -1,
          {
             i_s => sub { local $_ =    $string; /gimble/i },
             I_s => sub { local $_ =    $string; /GiMbLe/  },
            lc_s => sub { local $_ = lc $string; /gimble/  },
             i_f => sub { local $_ =    $string; /foobar/i },
             I_f => sub { local $_ =    $string; /foobar/  },
            lc_f => sub { local $_ = lc $string; /foobar/  },
          }
        );
__END__
         Rate lc_f lc_s  i_s  I_s  i_f  I_f
lc_f 459364/s   --  -4% -13% -27% -33% -33%
lc_s 477204/s   4%   -- -10% -24% -30% -30%
i_s  530962/s  16%  11%   -- -15% -22% -22%
I_s  628278/s  37%  32%  18%   --  -8%  -8%
i_f  681314/s  48%  43%  28%   8%   --  -0%
I_f  681315/s  48%  43%  28%   8%   0%   --
[download]

the lowliest monk

[reply]
[d/l]

Re: Slowness of /i
by PodMaster (Abbot) on May 01, 2005 at 22:26 UTC

HTML::Template

Q: Why do you use /[Tt]/ instead of /t/i?  It's so ugly!

A: Simple - the case-insensitive match switch is very inefficient.
According to _Mastering_Regular_Expressions_ from O'Reilly Press,
/[Tt]/ is faster and more space efficient than /t/i - by as much as
double against long strings.  //i essentially does a lc() on the
string and keeps a temporary copy in memory.

When this changes, and it is in the 5.6 development series, I will
gladly use //i.  Believe me, I realize [Tt] is hideously ugly.
[download]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re: Slowness of /i
by itub (Priest) on May 01, 2005 at 22:19 UTC

perl592delta

[reply]