in reply to Re: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
in thread Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)

This does not do what the title suggest the problem requires: finding the longest key first. If you have keys "Short" and "Longer Key", and the string is "This is a Short Key and a Longer Key", your solution will report "Short" (left most), while the OP original solution reports "Longer Key".

Now, it may be that the OP is satisfied by that - but the title suggests he won't.

  • Comment on Re^2: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)

Replies are listed 'Best First'.
Re^3: Matching Many Strings against a Large List of Hash Keys (case insensitively, longest key first)
by repellent (Priest) on May 14, 2010 at 17:58 UTC
    Right. Code has been updated.
      Now you are making the assumption keys cannot overlap. The OP never mentions that. Consider the keys "Two Three" and "Three Four Five" against the string "One Two Three Four Five". Your code will fail to return "Three Four Five" - after "Two Three" has been matched, no match in "Four Five" is found.
        Yes, that assumption comes from using an assembled regex match. Node has been updated again to make the assumption explicit.

        For kicks, let's try to handle overlapping keys using the assembled regex approach:
        use Regexp::Assemble; use Regexp::Exhaustive qw(exhaustive); use List::Util qw(reduce); my @keys = map { quotemeta } keys %hash; my $key_re = Regexp::Assemble->new->add(@keys)->re; for my $string (@strings) { my $match = reduce { length($a) > length($b) ? $a : $b } exhaustive($string => qr/\b($key_re)\b/i); print "Found '$match' in '$string'\n" if defined $match; }

        But then, the performance hit of using Regexp::Exhaustive removes any justification to use the assembled regex in the first place. Ho hum.