What about m/(?:\A|\n)(.*$pat.*\n)/omg?
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
| [reply] [d/l] |
Good stuff.. It's faster :-)
#!/usr/local/bin/perl -w
use strict;
use Benchmark;
my $pat = 'foo';
$/=undef;
open F,"<","/usr/share/dict/words";
my $mmap = <F>;
close F;
timethese( 100, {
# 'orig' => sub {
# while ($mmap =~ m/^([^\n]*$pat[^\n]\n)/omg) {
# warn $1;
# }
# },
'fast' => sub {
while ($mmap =~ m/\n?(.*$pat.*\n)/omg) {
warn $1;
}
},
'better' => sub {
while ($mmap =~ m/(?:\A|\n)(.*$pat.*\n)/omg) {
warn $1;
}
},
}
)
__END__
Benchmark: timing 100 iterations of better, fast...
better: 26 wallclock secs (23.13 usr + 0.06 sys = 23.19 CPU) @ 4
+.31/s (n=100)
fast: 41 wallclock secs (39.73 usr + 0.09 sys = 39.82 CPU) @ 2
+.51/s (n=100)
(original takes about 25 seconds per iteration on my machine)
| [reply] [d/l] |
Ah, thanks, that makes a big difference (though it's still slower than using index/rindex). Do you have any explanation for why? My understanding was that ^ meant pretty much the same thing as \n?.
dragonchild's code is even faster, in some cases by a factor of 3. | [reply] [d/l] [select] |
Well, ^ doesn't mean the same as \n? at all. Like not even close. According to Programming Perl 3rd edition pages 150 and 159, ^, when used with the /m modifier, means to match after embedded newlines or the beginning of the string.
\n? means something akin to possibly match a newline, but maybe not. That doesn't provide the engine with any understanding of where to match. Had you said something like /\n(.*$pat.*\n)/, you would have been better off (except for not matching the first line). Of course, with /m, you should be able to do /^(.*$pat.*)$/omsg and it should be about as efficient as my regex.
With regexes, it's always better to be as explicit as possible. This will allow the engine to make a number of optimizations. Some of those optimizations, as you have found out, can mean the difference between 2500 seconds and 23 seconds.
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
| [reply] [d/l] [select] |
| [reply] [d/l] [select] |
| [reply] |