Per ikegami's suggestion that it is advantageous to do as much work outside the loop(s) as possible, this code may be helpful (although I'm not really sure I understand the problem completely):

use warnings; use strict; use Data::Dumper; MAIN: { # begin main loop # define hashes for text and patterns. my %text = ( text_id_1 => 'four score and foo bar baz seven years', text_id_2 => 'to fee fie foe or not', text_id_3 => 'to wibble or not to wobble', text_id_4 => 'text with no patterns present', text_id_5 => 'text with fee fie foe and wobble present', # ...etc. ); my %patterns = ( 'fee fie foe' => 'hi_lvl_id_1', 'foo bar baz' => 'hi_lvl_id_2', 'wibble' => 'hi_lvl_id_3', 'wobble' => 'hi_lvl_id_4', 'quux' => 'hi_lvl_id_5', # pattern present in no text # ...etc. ); # preprocess patterns hash for data extraction. while (my ($string, $id) = each %patterns) { $patterns{$string} = { hi_lvl_id => $id, regex => qr{ \b \Q$string\E \b }xms, parts => [ split /\s/, $string ], }; } # initialize output hashes. my %pattern_ids = map { $_ => '' } keys %text; my %part_ids = map { $_ => {} } keys %text; match(\%patterns, \%text, \%pattern_ids, \%part_ids); print "%pattern_ids: \n", Dumper \%pattern_ids; print "%part_ids: \n", Dumper \%part_ids; } # end main loop # subroutines ################################## sub match { my ($hr_patterns, # ref. to input hash: patterns $hr_text, # ref. to input hash: text strings $hr_pattern_ids, # ref. to output hash: identified pattern ids $hr_part_ids, # ref. to output hash: identified part ids ) = @_; PATTERN: for my $hr_pattern (values %$hr_patterns) { TEXT: while (my($text_id, $text) = each %$hr_text) { next TEXT unless $text =~ $hr_pattern->{regex}; $hr_pattern_ids->{$text_id} .= ":$hr_pattern->{hi_lvl_id}"; my @parts = @{ $hr_pattern->{parts} }; # for convenience @{ $hr_part_ids->{$text_id} }{@parts} = (0) x @parts; } # end while TEXT loop } # end for PATTERN loop }

Output:

%pattern_ids: $VAR1 = { 'text_id_3' => ':hi_lvl_id_3:hi_lvl_id_4', 'text_id_5' => ':hi_lvl_id_1:hi_lvl_id_4', 'text_id_2' => ':hi_lvl_id_1', 'text_id_4' => '', 'text_id_1' => ':hi_lvl_id_2' }; %part_ids: $VAR1 = { 'text_id_3' => { 'wibble' => 0, 'wobble' => 0 }, 'text_id_5' => { 'foe' => 0, 'wobble' => 0, 'fie' => 0, 'fee' => 0 }, 'text_id_2' => { 'foe' => 0, 'fie' => 0, 'fee' => 0 }, 'text_id_4' => {}, 'text_id_1' => { 'bar' => 0, 'baz' => 0, 'foo' => 0 } };

Perhaps more advantageous would be moritz's suggestion to compile all the patterns into one massive alternation, but I would quail at the thought of a regex that might easily exceed a million characters. But if it works...


In reply to Re: Efficient regex matching with qr//; Can I do better? by Anonymous Monk
in thread Efficient regex matching with qr//; Can I do better? by kruppy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.