sulfericacid has asked for the wisdom of the Perl Monks concerning the following question:

A while back I was told that matching using /i is actually rather slow and now I'm wondering why this is.

Is it literally making every possible CaPs CoMBiNatIoN for all retrievable data? To me this sounds rather costly and would explain why case insensitivity would be slower. But this also doesn't sound like typical behavoir. I'd imagine it would instead automagically produce all lowercase data (and a lowercase match string).

Now if this is what Perl really does, why would it be slow? Just kind of curious how case insensitivity really works and why it's costly.

Thanks wise monks.



"Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

sulfericacid

Replies are listed 'Best First'.
Re: Slowness of /i
by Zaxo (Archbishop) on May 01, 2005 at 18:22 UTC

    It's not as bad as generating all combinations. Roughly speaking, the regex engine only needs to compare upper- and lower-case versions of the next character in the pattern, and it gets to move on as soon as it finds a match.

    As you suggest, if speed matters it's best to apply lc to the searched text and use a lower-cased regex pattern.

    After Compline,
    Zaxo

      Actually, pre-applying lc did not fare so well:

      use strict; use warnings; use Benchmark 'cmpthese'; my $string = 'TwAs BrIlLiG aNd ThE sLiThY tOvEs DiD gYrE aNd GiMbLe'; cmpthese( -1, { i_s => sub { local $_ = $string; /gimble/i }, I_s => sub { local $_ = $string; /GiMbLe/ }, lc_s => sub { local $_ = lc $string; /gimble/ }, i_f => sub { local $_ = $string; /foobar/i }, I_f => sub { local $_ = $string; /foobar/ }, lc_f => sub { local $_ = lc $string; /foobar/ }, } ); __END__ Rate lc_f lc_s i_s I_s i_f I_f lc_f 459364/s -- -4% -13% -27% -33% -33% lc_s 477204/s 4% -- -10% -24% -30% -30% i_s 530962/s 16% 11% -- -15% -22% -22% I_s 628278/s 37% 32% 18% -- -8% -8% i_f 681314/s 48% 43% 28% 8% -- -0% I_f 681315/s 48% 43% 28% 8% 0% --

      the lowliest monk

Re: Slowness of /i
by PodMaster (Abbot) on May 01, 2005 at 22:26 UTC
    From the HTML::Template documentation
    Q: Why do you use /[Tt]/ instead of /t/i? It's so ugly! A: Simple - the case-insensitive match switch is very inefficient. According to _Mastering_Regular_Expressions_ from O'Reilly Press, /[Tt]/ is faster and more space efficient than /t/i - by as much as double against long strings. //i essentially does a lc() on the string and keeps a temporary copy in memory. When this changes, and it is in the 5.6 development series, I will gladly use //i. Believe me, I realize [Tt] is hideously ugly.
    I'm sure this is probably mentioned in a few others places, but I remember cause I wrote a patch once for HTML::Template.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Slowness of /i
by itub (Priest) on May 01, 2005 at 22:19 UTC
    In ASCII, lc is a trivial operation. In Unicode, it's much more complicated, which makes it comparatively very slow (it is the bottleneck in some of my applications). According to perl592delta, the newest development version of perl is faster, but I haven't tried it yet.