in reply to Re^2: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
in thread Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)

Right. Code has been updated.
  • Comment on Re^3: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)

Replies are listed 'Best First'.
Re^4: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
by JavaFan (Canon) on May 14, 2010 at 18:01 UTC
    Now you are making the assumption keys cannot overlap. The OP never mentions that. Consider the keys "Two Three" and "Three Four Five" against the string "One Two Three Four Five". Your code will fail to return "Three Four Five" - after "Two Three" has been matched, no match in "Four Five" is found.
      Yes, that assumption comes from using an assembled regex match. Node has been updated again to make the assumption explicit.

      For kicks, let's try to handle overlapping keys using the assembled regex approach:
      use Regexp::Assemble; use Regexp::Exhaustive qw(exhaustive); use List::Util qw(reduce); my @keys = map { quotemeta } keys %hash; my $key_re = Regexp::Assemble->new->add(@keys)->re; for my $string (@strings) { my $match = reduce { length($a) > length($b) ? $a : $b } exhaustive($string => qr/\b($key_re)\b/i); print "Found '$match' in '$string'\n" if defined $match; }

      But then, the performance hit of using Regexp::Exhaustive removes any justification to use the assembled regex in the first place. Ho hum.