in reply to Re: Fast searching of multiple substrings in a string
in thread Fast searching of multiple substrings in a string

I was inspired to benchmark your advice. In my testing environment, I was astonished to find out that the speed increases as you collect more and more letters (first letters, then second letters, etc.) in your look-ahead.
Benchmark: timing 200 iterations of lookahead1, lookahead3, lookahead4 +, lookahead5, nolookahead... lookahead1: 3 wallclock secs ( 2.93 usr + 0.00 sys = 2.93 CPU) @ 68 +.26/s (n=200) lookahead3: 2 wallclock secs ( 2.23 usr + 0.00 sys = 2.23 CPU) @ 89 +.69/s (n=200) lookahead4: 2 wallclock secs ( 2.03 usr + 0.00 sys = 2.03 CPU) @ 98 +.52/s (n=200) lookahead5: 2 wallclock secs ( 2.00 usr + 0.00 sys = 2.00 CPU) @ 10 +0.00/s (n=200) nolookahead: 7 wallclock secs ( 6.30 usr + 0.00 sys = 6.30 CPU) @ 3 +1.75/s (n=200)
I used the following code and used this document as my input.
#!/usr/bin/perl use strict; use warnings; use Benchmark; undef $/; my $text = <>; # "progressive", "great", "interlacing", "really" and "wonder" are in +the document my $words = q{ featuring | blossom | great | interlacing | really | linux | thought | wonder | progressive }; timethese(200, { lookahead5 => sub { 1 while $text =~ m{(?= [fbgrltwpi][elihorn][aoent][tsaludge][ustrlxger] )( $words )}gix; }, lookahead4 => sub { 1 while $text =~ m{(?= [fbgrltwpi][elihorn][aoent][tsaludge] )( $words )}gix; }, lookahead3 => sub { 1 while $text =~ m{(?= [fbgrltwpi][elihorn][aoent] )( $words )}gix; }, lookahead1 => sub { 1 while $text =~ m{(?= [fbgrltwpi] )( $words )}gix; }, nolookahead => sub { 1 while $text =~ m{( $words )}gix; }, });