Re^4: Efficient regex matching with qr//; Can I do better?

The idea of using an array and specify indexes I didn't really understand the benefit of at first glance.

The idea is to eliminate as much code as possible specifically from the parts that run for each pattern and each text string. This means this bit:

while (my($text_id,$text) = each(%hash2) ) {
      if ($text =~ $match)
[download]

Which, for each iteration, does this:

each returns the key and value
those are copied in a list assigment
the match is attempted

Both the copy and the overhead of the list assignment (which is not insignificant, when repeated a billion times) aren't necessary.

If the text strings are unique, such that you can determine the $text_id given $text, you can leave out the array stuff and just do this:

   my @text = values %hash2;
   my %reverse_hash2 = reverse %hash2;

   while ( my($pattern,$high_lvl_id) = each(%hash1) ) {
       my $match = qr/\b$pattern\b/;
       for my $text (@text) {
           if ( $text =~ $match ) {
               my $text_id = $reverese_hash2{$text};
               ...
           }
       }
   }
[download]

Because for (@array) doesn't copy array but just iterates over it, and $text is aliased to each element in turn, not copied over it, this should be faster.

--
Online Fortune Cookie Search

Comment on Re^4: Efficient regex matching with qr//; Can I do better? Select or Download Code

Replies are listed 'Best First'.
Re^5: Efficient regex matching with qr//; Can I do better? by kruppy (Initiate) on Jul 14, 2008 at 05:46 UTC
That is true. Does that also mean that it would be more efficient to write `foreach my $text_id (keys %hash2) { if ($hash2{$text_id} =~ $match) { ... } }` [download] because then you copy nothing? If this is the case I was fooled by someone else (not in this thread) who said that using each was more efficient than using for/foreach... Your second suggestion would, however, not work because there is no guarantee the strings are unique (in fact, I am certain they are not). Thanks.	[reply] [d/l]
Re^6: Efficient regex matching with qr//; Can I do better? by ysth (Canon) on Jul 14, 2008 at 05:49 UTC
Try it and see? each() is more memory-efficient, but you aren't talking about millions of hash entries, so that's not likely to be a concern. -- Online Fortune Cookie Search	[reply]