symgryph has asked for the wisdom of the Perl Monks concerning the following question:

This is more of a 'conceptual' question on how do to several things. The first is to 'build' a hash table of 'regexes' that are being used in a program, given an input line the hash will lookup the shorthand values (aka variable name) and then test for the 'success' or failure of the regexes on input. So question 1 is How do I effectively 'quote' regexes inside a hash table? and secondly: How do I 'test' the regexes inside the hash table? Question 2 is: the actual regexes and 'hash lookup values' are quite messy as they come from java. Some of the key lookup names are quite horrid:
Key Lookup $(server_connection.socket_errno) [\d\-]+ %t \d\d/\d\d/\d\d\d\d:\d\d:\d\d:\d\d
I am then 'chopping up' log entries using split based upon white space, and would like to 'test' field order as defined in a first line of log:
$(server_connection.socket_errno) %t blah
Then input to follow in 'separated by whitespace same order as 'first line' as noted above
my-test 10:00
Ultimately I want to know if there is a 'miss' or a 'hit' Which I know can be nicely done with a counter. I'm not really looking for 'code' but more for ideas since I did write some, but it was quite messy.
"Two Wheels good, Four wheels bad."
  • Comment on Using a 'hash' of regexes, then seeing if they match based upon a 'split' mapped to an array?
  • Select or Download Code

Replies are listed 'Best First'.
Re: Using a 'hash' of regexes, then seeing if they match based upon a 'split' mapped to an array?
by Discipulus (Canon) on Apr 12, 2016 at 06:55 UTC
    hello symgryph,

    so you need a sort of dispatch table of regexes? you can easily store your regexes into hashes values using the qr operator.

    Using an hash is the right thing unless you need the order to be preserved: if the case, you need to save the order into another datastructure, probably an array.

    Consider this silly example (run it and see how silly it is against the last line: Male is counted as name and as gender too!)

    use strict; use warnings; my %look = ( name => qr/([A-Z]\w+\s?)+/, gender=> qr/[fF]?e?[mM]ale/, tel => qr/\d{5,8}/, ); sub validate{ my($str,$pattern)=@_; my $count = () = $str =~ /$pattern/g; return $count; } while (defined (my $line = <DATA>)){ chomp $line; print qq("$line" contains:\n); foreach my $pattern (keys %look){ my $res = validate($line, $look{$pattern}); print "\t$res $pattern\n" } } __DATA__ Kurt Perlish male 5555555 21212121 Mary Perl 6565656 me Male 12042016

    L*

    Update: never mind for horrid names: in Perl Hash's keys names are forced to be stringified, so you can have spaces, dots and other garbage with no problems.

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Using a 'hash' of regexes, then seeing if they match based upon a 'split' mapped to an array?
by AnomalousMonk (Archbishop) on Apr 12, 2016 at 15:17 UTC

    I don't fully understand what you're after, but here's an approach that might be interesting. It parses the template elements in order through a string. (Actually, a full-blown parser might be what you really need.)

    c:\@Work\Perl\monks>perl -wMstrict -le "my @lines = ( 'my-test 10:11:23 blah yada', 'test 11:22:33', ' taste-test 01:02:03 fie', '10:20:30 toast', '999 hooha yip yap', 'another-test 999 foo fee', ); ;; my %parse = ( '$(long_awkward.string)' => qr{ (?<! [[:alpha:]]) [[:alpha:]]+ (?: - [[:alpha:]]+)* (?! [[:al +pha:]]) }xms, '%t' => qr{ (?<! \d) \d\d : \d\d : \d\d (?! \d) }xms, ); ;; my $template = '$(long_awkward.string) %t xxx yyy zzz'; ;; for my $line (@lines) { my @sub_tmpls = (split ' ', $template)[0, 1]; my $match = validate($line, \%parse, @sub_tmpls); print qq{'$line' }, $match ? 'matches' : 'NO match'; } ;; sub validate { my ($string, $hr_parser, @sub_templates) = @_; ;; for my $st (@sub_templates) { return unless $string =~ m{ \G \s* $hr_parser->{$st} }xmsg; } return 1; } " 'my-test 10:11:23 blah yada' matches 'test 11:22:33' matches ' taste-test 01:02:03 fie' matches '10:20:30 toast' NO match '999 hooha yip yap' NO match 'another-test 999 foo fee' NO match


    Give a man a fish:  <%-{-{-{-<