in reply to regex: extract multiple number of date patterns from certain lines

The match operator is not particularly well suited to extract this data since the data has two dimensions. One solution:
while ( / ^ (\d{4}-\d\d-\d\d) .*dates processed:[ ] ( (?:\d{4}-\d\d-\d\d,[ ])* \d{4}-\d\d-\d\d ) $ /mg ) { my $on = $1; my $procesed = $2; my @processed = split(/, /, $processed); # Do something with $on and @processed. }

Or if you are dealing with a file handle,

while (<$fh>) { my ($on, $processed) = / ^ (\d{4}-\d\d-\d\d) .*dates processed:[ ] ( (?:\d{4}-\d\d-\d\d,[ ])* \d{4}-\d\d-\d\d ) $ / or next; my @processed = split(/, /, $processed); # Do something with $on and @processed. }

Update: Added file handle version since that's probably what the OP really wants.

Replies are listed 'Best First'.
Re^2: regex: extract multiple number of date patterns from certain lines
by moritz (Cardinal) on Mar 04, 2009 at 16:23 UTC
    The match operator is not particularly well suited to extract this data since the data has two dimensions.

    Due to the different structure of captures in Perl 6 regexes that doesn't hold true for Perl 6 anymore. Here's a Perl 6 solution that extracts all trailing dates with one regex match:

    use v6; my $str = '2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009 +-01-30 2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17 2009-02-19 05:58:29,138 dates processed: 2009-02-18 '; token date { \d**4 '-' \d**2 '-' \d ** 2 }; regex line { ^^ \N* 'processed:' \s* <date> [','\s* <date>]* \s* \n } +; if $str ~~ m/ ^ <line>+ / { for $<line> -> $l { print "Dates in line $l"; .say for $l<date>; } } else { say "no match"; }

    (tested on Rakudo).

      Very nice.

      Unfortunately I only get to upgrade production to 5.10 in two weeks time, 6 is going to have to wait a few more weeks I guess.

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!
Re^2: regex: extract multiple number of date patterns from certain lines
by Random_Walk (Prior) on Mar 04, 2009 at 16:30 UTC

    Good $localtime ikegami++ sir,

    I am actually dealing with an existing code base that uses POE::Wheel::FollowTail and checks each new log line against a list of pre-compiled regex patterns, hence the desire to do it in a single regex. When it finds a match it calls the forwarder method on the object that is associated with the matching pattern.

    Among other refs passed to the forwarding object is one to a list of matches from the regex normally saving having to split it up all over again. I do get a second bite at the cherry in the objects forwarder method. It would have been nice though after matching all those dates if I could just pass them all through already separated.

    Thanks for looking, at least I now know it is not me making a trivial error

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!
      I don't see anything in POE::Wheel::FollowTail about regexps, so I presume it's not a limitation of that module. Why can't your check list contains both regexps and code refs?

        Of course you are right, there is nothing in POE::Wheel::FollowTail limiting me to regex. P.W.FT was just a nice framework I grabbed about 3 years ago to watch a bunch of logfiles for various patterns and then hand these off to arbitrary processing.

        To keep the core log monitor small and clean and hopefully fast all it will accept is regex patterns. A config file defines these and the module to use for each one. When a pattern matches the 'forward' method of the registering object (instantiated at start up) is called and given refs to the pattern involved, the log file name and the matches. Code of arbitrary complexity can be executed in the object.

        In the normal way of things the requirements started very simple years ago, even allowing a config file full of regexen was called overkill then by some. Now three years down the line this code runs fine on hundreds of production servers and happily fulfils all sorts of varied requirements. As I said I get a second bite of the cherry in the object the match is passed too. I am now using your regex (thanks) and post processing in the object to split the trailing dates then send them onto the sysmanagement bus, all works great.

        I could change the monitor to accept coderefs as well as regex but that would involve re-write testing and change controlled deployment throughout a monster bureaucracy, of core production code. When I started this node I just thought I was missing something blindingly obvious in not getting my regex to work as I wanted, I was vexed. Now I know I had hit a limit of what is reasonable in a single line of regex.

        Thanks again for your time, it is appreciated,
        R.

        Pereant, qui ante nos nostra dixerunt!