hazylife has asked for the wisdom of the Perl Monks concerning the following question:
So I'm working on a Perl version of grep for FB2 (XML based) ebooks, and just like grep it has a command line option that turns on case insensitive matching:
$re = $opts{i} ? qr/$pattern/i : qr/$pattern/;Everything seems to be in place and working fine except for one major setback: matching a chunk of multibyte text against a pattern compiled with qr//i turns out to be 5-15 times slower than doing the same thing with m//i and just plain qr//, whereas one would expect the two approaches to be equally fast.
[ hmm, I'm a bit worried about the integrity of the UTF-8 (cyrillic) test string in this ]
#!/usr/bin/perl use strict; use utf8; use Benchmark ':all'; # the pattern doesn't even need to contain anything fancy # for the problem to manifest itself my $pattern = 'dumbest pattern ever'; # it's all about whether or not the /i flag is embedded into the regex my $re = qr/$pattern/; my $re_i = qr/$pattern/i; # qr//i causes a noticeable slowdown even when dealing with 7-bit (US- +ASCII) # strings, but this being multibyte seems to make things _a lot_ worse my $str = 'очень длинная строка ' x 10; my $count = 300_000; cmpthese($count, { 'qr//i+m//' => sub { $str =~ /$re_i/ }, 'qr//+m//i' => sub { $str =~ /$re/i }, 'qr//+m//' => sub { $str =~ /$re/ } });
$ ./qr-utf8.pl Rate qr//i+m// qr//+m//i qr//+m// qr//i+m// 10881/s -- -98% -98% qr//+m//i 505263/s 4543% -- -7% qr//+m// 540845/s 4870% 7% --
One possible way around this would be to altogether abandon qr//i and instead eval() my matching subroutine with all the necessary /i flags textually inlined (there are two m//'s and one s///), but that's still quite ugly. Any suggestions?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: qr//i versus m//i
by dave_the_m (Monsignor) on Feb 21, 2014 at 21:33 UTC | |
by hazylife (Monk) on Feb 22, 2014 at 12:26 UTC | |
|
Re: qr//i versus m//i
by Anonymous Monk on Feb 21, 2014 at 19:10 UTC | |
by Anonymous Monk on Feb 21, 2014 at 19:38 UTC | |
by Anonymous Monk on Feb 21, 2014 at 19:46 UTC |