in reply to Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)

You are forcing Perl to recompile each regex over and over. A simple improvement would be:

# ... my %regex; @regex{ keys %hash }= map { qr/(?<!\S)\Q$_\E(?!\S)/i } values %hash; foreach my $string ( @strings ) { foreach my $key ( @keys_sorted_by_length_desc ) { if( $string =~ $regex{$key} ) { print "Found '$key' in '$string'\n"; last; } } }

But you might consider matching patterns of words rather than strings:

#!/usr/bin/perl -w use strict; my @strings = ( 'Some string about this long or so, maybe this long', 'I like pizza this long or so, maybe this long', 'this long or so, maybe this French Fries long', 'This Sugar Rush Rocks. maybe this do not stop the clock.', ); my @repl = ( [qw< Sugar Rush Rocks >], 'whatever', 'long', 'itsgood', [qw< this long >], 'ilikeit', [qw< maybe this >], 'itsokay', [qw< Some String >], 'loooveit', ); my %repl; while( @repl ) { my $word= shift @repl; my $repl= shift @repl; my $len; if( ! ref $word ) { $repl= [ $repl, length($word) ]; } else { my $len= length join ' ', @$word; my $first= shift @$word; $repl= [ $repl, map( lc $_, @$word ), $len ]; $word= $first; } push @{ $repl{ lc $word } }, $repl; } for my $list ( values %repl ) { @$list= sort { $b->[-1] <=> $a->[-1] } @$list; pop @$_ for @$list; } STRING: foreach my $string ( @strings ) { my @words= $string =~ /(\S+)/g; my $i= 0; while( $i < @words ) { my $word= lc $words[$i]; next if ! $repl{$word}; for my $list ( @{ $repl{$word} } ) { my( $repl, @next )= @$list; next if grep $next[$_] ne lc( $words[$i+1+$_] || '' ), 0.. +$#next; print "Found '$word @next' in '$string'\n"; next STRING; } } continue { $i++; } }

- tye        

  • Comment on Re: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first) (words)
  • Select or Download Code