in reply to Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
Use index instead of a regex. Here's a benchmark (print is commented out to remove the effects from the benchmark.)
use Benchmark qw(cmpthese); my @strings = ( 'Some string about this long or so, maybe this long', 'I like pizza this long or so, maybe this long', 'this long or so, maybe this French Fries long', 'This Sugar Rush Rocks. maybe this do not stop the clock.', ); my %hash = ( 'Sugar Rush Rocks' => 'whatever', 'long' => 'itsgood', 'this long' => 'ilikeit', 'maybe this' => 'itsokay', 'Some String' => 'loooveit' ); my @keys_sorted_by_length_desc = sort { length $b <=> length $a } keys %hash; cmpthese( -5, { 'Regex' => \&use_regex, 'Index' => \&use_index, } ); sub use_regex { foreach my $string (@strings) { foreach my $key (@keys_sorted_by_length_desc) { my $key_re = quotemeta($key); if ( $string =~ /(:?\A|\s)($key_re)\s*/i ) { #print "Found '$key' in '$string'\n"; last; } } } } sub use_index { foreach my $string (@strings) { foreach my $key (@keys_sorted_by_length_desc) { my $lcstring = lc $string; my $lckey = lc $key; if ( ( index $lcstring, $lckey ) > -1 ) { #print "Found '$key' in '$string'\n"; last; } } } } __END__ Rate Regex Index Regex 7539/s -- -86% Index 52227/s 593% --
Update: move generating the sorted list out of the subroutines. Even better.
Rate Regex Index
Regex 8311/s -- -92%
Index 101399/s 1120% --
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
by ikegami (Patriarch) on May 14, 2010 at 18:52 UTC | |
by thundergnat (Deacon) on May 14, 2010 at 19:36 UTC | |
by ikegami (Patriarch) on May 14, 2010 at 22:53 UTC |