in reply to Re: Re: Efficiency in regex
in thread Efficiency in regex
No. Paladin's solution has 2 operations for each piece of data: 1 match, 2 lookup. The original solution has anywhere from 1 to n (in this case, n=15) operations for each piece of data, depending on how soon the item matches. Paladin's is a constant O(2), while the original will average around O(n/2). A test:
use Benchmark; my %names; my (@list) = qw(Jones Rogers Edwards Smith Jackson Ryan Jones tilly dws paladin footpad jeffa Elian ybiC TheDamian ); @names{@list} = (1) x @list; my $names = join '|', @list; my $data = do {local $/; <DATA>}; timethese ( 100_000, { "paladin" => sub { my $text = $data; foreach my $name ($text=~/(\b(?:[A-Z](?:\.|[a-z]+)\s+)+(\w ++))/go){ "$name\n" if exists $names{$name} } }, "original" => sub { my $text = $data; foreach my $name ($text=~/(\b(?:[A-Z](?:\.|[a-z]+)\s+)+(?: +$names))/sgo){ "$name\n" } } }); __DATA__ Dr. Happy Sr. Rogers Senoir. Chacho Senoira. Chachese Mr. Ryan Mrs. Smith (I'm sorry) Ms. Jackson (oooh, I am for reaaal) Dr. Tilly Mr. Elian Asdokfj. adfsdf Ms. asdfasdf Mr. Burns Qsdokfj. adfsdf q. TheDamian Hello. There This. Should Not. Fail
And the results:
Benchmark: timing 100000 iterations of optimized, original, paladin... original: 25 wallclock secs (23.14 usr + 0.00 sys = 23.14 CPU) @ 43 +20.77/s (n=100000) paladin: 18 wallclock secs (16.93 usr + 0.00 sys = 16.93 CPU) @ 59 +05.63/s (n=100000)
In response to your update, I think you are mistaken; "end," doesn't match anywhere at all.
|
|---|