The idea of using an array and specify indexes I didn't really understand the benefit of at first glance.
The idea is to eliminate as much code as possible specifically from the parts that run for each pattern and each text string. This means this bit:
while (my($text_id,$text) = each(%hash2) ) { if ($text =~ $match)
Which, for each iteration, does this: Both the copy and the overhead of the list assignment (which is not insignificant, when repeated a billion times) aren't necessary.

If the text strings are unique, such that you can determine the $text_id given $text, you can leave out the array stuff and just do this:

my @text = values %hash2; my %reverse_hash2 = reverse %hash2; while ( my($pattern,$high_lvl_id) = each(%hash1) ) { my $match = qr/\b$pattern\b/; for my $text (@text) { if ( $text =~ $match ) { my $text_id = $reverese_hash2{$text}; ... } } }
Because for (@array) doesn't copy array but just iterates over it, and $text is aliased to each element in turn, not copied over it, this should be faster.

In reply to Re^4: Efficient regex matching with qr//; Can I do better? by ysth
in thread Efficient regex matching with qr//; Can I do better? by kruppy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.