in reply to Re^3: Efficient regex matching with qr//; Can I do better?
in thread Efficient regex matching with qr//; Can I do better?

The idea of using an array and specify indexes I didn't really understand the benefit of at first glance.
The idea is to eliminate as much code as possible specifically from the parts that run for each pattern and each text string. This means this bit:
while (my($text_id,$text) = each(%hash2) ) { if ($text =~ $match)
Which, for each iteration, does this: Both the copy and the overhead of the list assignment (which is not insignificant, when repeated a billion times) aren't necessary.

If the text strings are unique, such that you can determine the $text_id given $text, you can leave out the array stuff and just do this:

my @text = values %hash2; my %reverse_hash2 = reverse %hash2; while ( my($pattern,$high_lvl_id) = each(%hash1) ) { my $match = qr/\b$pattern\b/; for my $text (@text) { if ( $text =~ $match ) { my $text_id = $reverese_hash2{$text}; ... } } }
Because for (@array) doesn't copy array but just iterates over it, and $text is aliased to each element in turn, not copied over it, this should be faster.

Replies are listed 'Best First'.
Re^5: Efficient regex matching with qr//; Can I do better?
by kruppy (Initiate) on Jul 14, 2008 at 05:46 UTC
    That is true. Does that also mean that it would be more efficient to write
    foreach my $text_id (keys %hash2) { if ($hash2{$text_id} =~ $match) { ... } }
    because then you copy nothing? If this is the case I was fooled by someone else (not in this thread) who said that using each was more efficient than using for/foreach...
    Your second suggestion would, however, not work because there is no guarantee the strings are unique (in fact, I am certain they are not).
    Thanks.
      Try it and see?

      each() is more memory-efficient, but you aren't talking about millions of hash entries, so that's not likely to be a concern.