perlprincess has asked for the wisdom of the Perl Monks concerning the following question:

I attemptong to querying a database using in excess of abou 25 regex's and I need to count the occurences of each regex's found and then count the number of entries left unmatched. Is there a simpler way code this rather than individually declaring a variable for each regex to count? Thank you for your help
  • Comment on Counting frequency of the regex matches

Replies are listed 'Best First'.
Re: Counting frequency of the regex matches
by pc88mxer (Vicar) on Jun 06, 2008 at 01:19 UTC
    You can store the regexs in an array or hash and use another array or hash to store the counts. E.g.:
    my @res = ( qr/this/, qr/that/, ..., qr//); my @matches; for each row $row: for my $i (0..$#res) { if ($row =~ m/$res[$i]/) { $matches[$i]++; last; } }
    This example assumes you want to stop once you have matched something. By making the last re match anything you can keep track of the number of unmatched rows without special casing it. That is, $matches[-1] is the number of rows that didn't match any of the other regular expressions.
      Brother pc88mxer,

      Huh?

      I'm confused by this snippet of code:

      for each row $row:
      That usage of each looks nothing like the perldoc on each. And I don't understand what row is doing as a bareword. Or the colon after $row.

      Could you point me to a place in the docs that would tell me what you're doing here? My thrashing around in the online documentation hasn't gotten me anywhere.

      throop

        "for each row $row:" seems like a comment, where either the "#" prefix was missed, or got included in the code section due to laziness of the poster.
        Sorry, it's pseudo code. I didn't know how the OP was generated the rows of the database, so I just wrote it out in English. I guess I been influenced too much by Knuth's Literate Programming.
Re: Counting frequency of the regex matches
by MidLifeXis (Monsignor) on Jun 06, 2008 at 04:11 UTC
Re: Counting frequency of the regex matches
by kyle (Abbot) on Jun 06, 2008 at 15:49 UTC

    Similar to the solution from pc88mxer, this keeps the regexs and the counts together.

    use List::Util qw( first ); my @re_count = map { { re => $_, count => 0 } } ( qr/this/, qr/that/, ..., qr// ); foreach my $row ( ... ) { my $re_ref = first { $row =~ $_->{re} } @re_count; $re_ref->{count}++ if $re_ref; }

    The map turns each regex into a hash reference containing the regex and a counter. Then, for each row, we loop over those with first from List::Util to get the first one with a regex that matches. If there's a match, increment its counter.

    I like this kind of data structure (AoH) because I can pile more data into the hash refs later if I want to. For example, I could add a human readable name to each item for reporting.