So I'm working on a Perl version of grep for FB2 (XML based) ebooks, and just like grep it has a command line option that turns on case insensitive matching:
$re = $opts{i} ? qr/$pattern/i : qr/$pattern/;Everything seems to be in place and working fine except for one major setback: matching a chunk of multibyte text against a pattern compiled with qr//i turns out to be 5-15 times slower than doing the same thing with m//i and just plain qr//, whereas one would expect the two approaches to be equally fast.
[ hmm, I'm a bit worried about the integrity of the UTF-8 (cyrillic) test string in this ]
#!/usr/bin/perl use strict; use utf8; use Benchmark ':all'; # the pattern doesn't even need to contain anything fancy # for the problem to manifest itself my $pattern = 'dumbest pattern ever'; # it's all about whether or not the /i flag is embedded into the regex my $re = qr/$pattern/; my $re_i = qr/$pattern/i; # qr//i causes a noticeable slowdown even when dealing with 7-bit (US- +ASCII) # strings, but this being multibyte seems to make things _a lot_ worse my $str = 'очень длинная строка ' x 10; my $count = 300_000; cmpthese($count, { 'qr//i+m//' => sub { $str =~ /$re_i/ }, 'qr//+m//i' => sub { $str =~ /$re/i }, 'qr//+m//' => sub { $str =~ /$re/ } });
$ ./qr-utf8.pl Rate qr//i+m// qr//+m//i qr//+m// qr//i+m// 10881/s -- -98% -98% qr//+m//i 505263/s 4543% -- -7% qr//+m// 540845/s 4870% 7% --
One possible way around this would be to altogether abandon qr//i and instead eval() my matching subroutine with all the necessary /i flags textually inlined (there are two m//'s and one s///), but that's still quite ugly. Any suggestions?
In reply to qr//i versus m//i by hazylife
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |