Re: regex: extract multiple number of date patterns from certain lines
by ikegami (Patriarch) on Mar 04, 2009 at 16:09 UTC
|
The match operator is not particularly well suited to extract this data since the data has two dimensions. One solution:
while (
/
^
(\d{4}-\d\d-\d\d)
.*dates processed:[ ]
( (?:\d{4}-\d\d-\d\d,[ ])* \d{4}-\d\d-\d\d )
$
/mg
) {
my $on = $1;
my $procesed = $2;
my @processed = split(/, /, $processed);
# Do something with $on and @processed.
}
Or if you are dealing with a file handle,
while (<$fh>) {
my ($on, $processed) = /
^
(\d{4}-\d\d-\d\d)
.*dates processed:[ ]
( (?:\d{4}-\d\d-\d\d,[ ])* \d{4}-\d\d-\d\d )
$
/
or next;
my @processed = split(/, /, $processed);
# Do something with $on and @processed.
}
Update: Added file handle version since that's probably what the OP really wants.
| [reply] [d/l] [select] |
|
|
use v6;
my $str =
'2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009
+-01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
2009-02-19 05:58:29,138 dates processed: 2009-02-18
';
token date { \d**4 '-' \d**2 '-' \d ** 2 };
regex line { ^^ \N* 'processed:' \s* <date> [','\s* <date>]* \s* \n }
+;
if $str ~~ m/ ^ <line>+ / {
for $<line> -> $l {
print "Dates in line $l";
.say for $l<date>;
}
} else {
say "no match";
}
(tested on Rakudo). | [reply] [d/l] |
|
|
| [reply] |
|
|
Good $localtime ikegami++ sir,
I am actually dealing with an existing code base that uses POE::Wheel::FollowTail and checks each new log line against a list of pre-compiled regex patterns, hence the desire to do it in a single regex. When it finds a match it calls the forwarder method on the object that is associated with the matching pattern.
Among other refs passed to the forwarding object is one to a list of matches from the regex normally saving having to split it up all over again. I do get a second bite at the cherry in the objects forwarder method. It would have been nice though after matching all those dates if I could just pass them all through already separated.
Thanks for looking, at least I now know it is not me making a trivial error
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
| [reply] |
|
|
I don't see anything in POE::Wheel::FollowTail about regexps, so I presume it's not a limitation of that module. Why can't your check list contains both regexps and code refs?
| [reply] |
|
|
Re: regex: extract multiple number of date patterns from certain lines
by johngg (Canon) on Mar 04, 2009 at 16:48 UTC
|
Does this code produce the results you need? For any line that doesn't match "dates processed:" it pushes an empty array reference onto the @results array just so that you can tell there was a line that didn't match.
use strict;
use warnings;
use Data::Dumper;
open my $logFH, q{<}, \ <<'EOD' or die qq{open: << HEREDOC: $!\n};
2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
Different line here
2009-02-19 05:58:29,138 dates processed: 2009-02-18
EOD
my @results = ();
while( <$logFH> )
{
chomp;
push( @results, [] ), next unless m{dates processed:};
my @dates = m{(\d{4}-\d\d-\d\d)}g;
push @results, \ @dates;
}
close $logFH or die qq{close: << HEREDOC: $!\n};
print Data::Dumper->Dumpxs( [ \ @results ], [ qw{ *results } ] );
The output.
@results = (
[
'2009-02-02',
'2009-01-31',
'2009-01-29',
'2009-01-30'
],
[
'2009-02-18',
'2009-02-16',
'2009-02-17'
],
[],
[
'2009-02-19',
'2009-02-18'
]
);
I hope this is useful to you.
Cheers, JohnGG | [reply] [d/l] [select] |
Re: regex: extract multiple number of date patterns from certain lines
by Bloodnok (Vicar) on Mar 04, 2009 at 15:58 UTC
|
while (<DATA>) {
@res = /(\d{4}(?:-\d\d){2}).*dates processed: (.*)/;
warn "@res";
}
__DATA__
2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
2009-02-19 05:58:29,138 dates processed: 2009-02-18
returns
2009-02-02 2009-01-31, 2009-01-29, 2009-01-30 at tst.pl line 3, <DATA>
+ line 1.
2009-02-18 2009-02-16, 2009-02-17 at tst.pl line 3, <DATA> line 2.
2009-02-19 2009-02-18 at tst.pl line 3, <DATA> line 3.
Update:
Following a change in requirmeents ;-) ...
use Data::Dumper;
while (<DATA>) {
@res = map { split } /(\d{4}(?:-\d\d){2}).*dates processed: (.*)/;
warn Dumper \@res;
}
__DATA__
2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
2009-02-19 05:58:29,138 dates processed: 2009-02-18
returns
$VAR1 = [
'2009-02-02',
'2009-01-31,',
'2009-01-29,',
'2009-01-30'
];
$VAR1 = [
'2009-02-18',
'2009-02-16,',
'2009-02-17'
];
$VAR1 = [
'2009-02-19',
'2009-02-18'
];
as required (nearly:-D) ??
A user level that continues to overstate my experience :-))
| [reply] [d/l] [select] |
|
|
@res = $_ =~/(\d{4}-\d\d-\d\d).*dates processed: ((:?\d{4}-\d\d-\d\d,?
+ ?)*)/
# still captures last result twice
# input
# 2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 200
+9-01-30
# output
# 2009-02-02, 2009-01-31, 2009-01-29, 2009-01-30, 2009-01-30
Update
Oeps, I am not splitting them with the above either, fooled myself because my debug testing printed the list out with a join ", ", doh!
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
| [reply] [d/l] |
Re: regex: extract multiple number of date patterns from certain lines
by Marshall (Canon) on Mar 04, 2009 at 19:00 UTC
|
I don't know what this first date is or whether you need it. I called it $stamp. I think this does what you want. If you don't need $stamp, then just assign it to undef.
#!/usr/bin/perl -w
use strict;
while (<DATA>)
{
next if (!/dates processed/);
my ($stamp, @dates) = ($_ =~ /(\d+-\d+-\d+)/g);
# or my (undef, @dates) = ($_ =~ /(\d+-\d+-\d+)/g);
# of course if that is what you want then change
# the following line too!
print "stamp=$stamp, dates are: @dates","\n";
}
#prints......
#stamp=2009-02-19, dates are: 2009-01-31 2009-01-29 2009-01-30
#stamp=2009-02-18, dates are: 2009-02-16 2009-02-17
#stamp=2009-02-19, dates are: 2009-02-18
__DATA__
2009-02-19 06:03:47,713 SOMETHING WRONG: 2009-01-33, 2009-01-44, 2009-
+01-33
2009-02-19 05 58 29 138 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
2009-02-19 05:58:29,138 dates processed: 2009-02-18
| [reply] [d/l] |
|
|
Hi Marshall
I do need the initial date, this is for of a logfile parser that captures these dates and sends them up the line to a monitoring application that then compares the timestamp date to the processed dates and raises an alarm a processed dates was too old.
The crux of the matter is that my log file parser has one shot at each log line with a regex and the matched parts are then passed up to the next stage, I want to get as much done in the regex as possible/reasonable, partly on the principle of keeping monitoring close to the monitored and partly for pure bloody minded IT geek fun.
sadly the main constraint of this problem is one line of regex, code is cheating!
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
| [reply] |
|
|
So, if I understand this correctly, you are saying that my code works, but there is some constraint that it has to be in one single regex? If that's the case, then we are into some obfuscated code problem and this is perhaps the wrong place?If we are talking about clarity and performance, then that's different. Fewer lines of Perl code doesn't always equal faster performance. I simplified this stuff like must match exactly 4 times, etc. This speeds up the regex engine. As far as clarity goes, I would struggle to be more clear (I'm not a guru). If you are interested in performance, then measure and test performance (run benchmarks). Counting the number of lines of source code is a relatively poor predictor of actual code performance.
Update: Well it just took some few seconds to get a negative vote on this post. I was genuinely trying to help with the original problem. I don't understand this requirement for "one line". I think that benchmarking and testing is the right way to go. I would be happy to help in this regard.
| [reply] |
|
|
|
|
Re: regex: extract multiple number of date patterns from certain lines
by GrandFather (Saint) on Mar 04, 2009 at 22:15 UTC
|
Can you use the trailing ,\d on the initial date/time to disambiguate? Consider:
use strict;
use warnings;
while (my $line = <DATA>) {
my @parts = $line =~ /(\d{4}-\d{2}-\d{2})(?:,\s|$)/g;
print $line;
print " ", join ("\n ", @parts), "\n";
}
__DATA__
2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
2009-02-19 05:58:29,138 dates processed: 2009-02-18
Prints:
2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
2009-01-31
2009-01-29
2009-01-30
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
2009-02-16
2009-02-17
2009-02-19 05:58:29,138 dates processed: 2009-02-18
2009-02-18
True laziness is hard work
| [reply] [d/l] [select] |
Re: regex: extract multiple number of date patterns from certain lines
by Anonymous Monk on Mar 04, 2009 at 15:54 UTC
|
There is comma followed by digits before "dates processed"
2009-02-19 05:58:29,138 dates processed: 2009-02-18
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
dddd-dd-dd dd:dd:dd,ddd : dddd-dd-dd
^^^^
||||
| [reply] [d/l] |
|
|
| [reply] [d/l] |
Re: regex: extract multiple number of date patterns from certain lines
by Anonymous Monk on Mar 04, 2009 at 16:00 UTC
|
given your example input, what exactly should @res contain? | [reply] |
|
|
2009-02-02 06:12:57,500 dates processed: 2009-01-31, 2009-01-29, 2009-
+01-30
@res = (2009-02-02, 2009-01-31, 2009-01-29, 2009-01-30)
2009-02-18 06:03:47,713 dates processed: 2009-02-16, 2009-02-17
@res = (2009-02-18, 2009-02-16, 2009-02-17)
2009-02-19 05:58:29,138 dates processed: 2009-02-18
@res = (2009-02-19, 2009-02-18)
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
| [reply] [d/l] |