in reply to Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)

use Regexp::Assemble; use List::Util qw(reduce); # XXX why have a hash if its values are not used? my @keys = map { quotemeta } keys %hash; my $key_re = Regexp::Assemble->new->add(@keys)->re; for my $string (@strings) { my $match = reduce { length($a) > length($b) ? $a : $b } ($string =~ /\b($key_re)\b/gi); print "Found '$match' in '$string'\n" if defined $match; }

Update1: Now handles multiple keys in same string.

Update2: Overlapping keys are not handled well, due to the left-to-right nature of the regex match. If this assumption can be made, hopefully this approach can improve performance. YMMV.
  • Comment on Re: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
  • Download Code

Replies are listed 'Best First'.
Re^2: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
by JavaFan (Canon) on May 14, 2010 at 17:35 UTC
    This does not do what the title suggest the problem requires: finding the longest key first. If you have keys "Short" and "Longer Key", and the string is "This is a Short Key and a Longer Key", your solution will report "Short" (left most), while the OP original solution reports "Longer Key".

    Now, it may be that the OP is satisfied by that - but the title suggests he won't.

      Right. Code has been updated.
        Now you are making the assumption keys cannot overlap. The OP never mentions that. Consider the keys "Two Three" and "Three Four Five" against the string "One Two Three Four Five". Your code will fail to return "Three Four Five" - after "Two Three" has been matched, no match in "Four Five" is found.
Re^2: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
by Anonymous Monk on May 14, 2010 at 18:03 UTC
    I didn't use the hash in the example, but I do want to use it!
    If (/..../) { $hash_value = $hash{$1???} }