Re: Not able to capture information
by kcott (Archbishop) on Feb 17, 2012 at 06:33 UTC
|
use strict;
use warnings;
while(<DATA>) {
chomp($_);
if ($_ =~ m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(\
+d{1,3})\]\s+ERRORMSG\s+(.*)/) {
my $date = $1;
my $err_no = $2;
my $err_msg = $3;
if ($. > 1) {
print qq{\n};
}
print "$date === $err_no === $err_msg";
}
else {
print qq{ $_};
}
}
print qq{\n};
produces this output:
ken@ganymede: ~/tmp
$ pm_multiline_regex.pl
2012/02/16 00:08:34 === 29 === unknown error Can't insert into price
+table Please check Valueprice.pm line 52.
2012/02/16 00:08:34 === 39 === Invalid User
2012/02/16 00:14:52 === 105 === missing conversion rate
2012/02/16 00:14:52 === 29 === Can't use an undefined value as a HASH
+ reference at Value.pm line 77.
ken@ganymede: ~/tmp
$
| [reply] [d/l] [select] |
|
|
I could see this required an if else statement but not how to prevent requiring data storage during the subroutine, that is, using arrays (see my comment below). So simple control of the \n character at the start of a line before flow control kicks in can save a lot of cpu. Rather than at the end where you need to hold the line while the data is fed in to find out if the next line is a match or not.
Drat, I was just starting to enjoy my array solutions.
| [reply] |
|
|
my @linearr;
...
print @linearr;
to
use Tie::File;
tie my @linearr, 'Tie::File', 'noa.log' or die $!;
...
untie @linearr;
You still might want to tweak the internals of the loop.
| [reply] [d/l] [select] |
Re: Not able to capture information
by oko1 (Deacon) on Feb 17, 2012 at 06:29 UTC
|
Man, that's one ugly regex. And I say that as a guy who's written a lot of ugly regexes. :)
I _think_ (kinda hard to tell from your misformatted "desired answer" line) you're looking for something like this:
#!/usr/bin/perl
use common::sense;
my $data = do { local $/; <DATA>; };
$data =~ s/\n(?!\[)/ /gs;
for (split /\n/, $data){
my @line = split /[\[\] ]+|ERRORMSG /, $_, 6;
print join(" === ", @line[1..3,5]), "\n";
}
__DATA__
[2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr
+ice table
Please check
Valueprice.pm line 52.
[2012/02/16 00:08:34] [39] ERRORMSG Invalid User
[2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate
[2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a
+HASH reference at Value.pm line 77.
Prints:
2012/02/16 === 00:08:34 === 29 === unknown error Can't insert into pri
+ce table Please check Valueprice.pm line 52.
2012/02/16 === 00:08:34 === 39 === Invalid User
2012/02/16 === 00:14:52 === 105 === missing conversion rate
2012/02/16 === 00:14:52 === 29 === Can't use an undefined value as a H
+ASH reference at Value.pm line 77.
Is that what you're looking for?
Update: Whoops - I think I just figured out what the OP is asking... so there are two problems in his code. Revised solution.
--
I hate storms, but calms undermine my spirits.
-- Bernard Moitessier, "The Long Way"
| [reply] [d/l] [select] |
Re: Not able to capture information
by Marshall (Canon) on Feb 17, 2012 at 06:51 UTC
|
This idea didn't work out as well as I thought it would, but I will post for entertainment value. There are a lot of ways to skin these cats...
#!/usr/bin/perl -w
use strict;
my @data = do{local $/ = "\n["; (<DATA>)};
@data = map{ s/\n/ /g; s/\[//g; s/\]/ ==/g; $_}@data;
print join "\n", @data;
=prints
2012/02/16 00:08:34 == 29 == ERRORMSG unknown error Can't insert into
+price table Please check Valueprice.pm line 52.
2012/02/16 00:08:34 == 39 == ERRORMSG Invalid User
2012/02/16 00:14:52 == 105 == ERRORMSG missing conversion rate
2012/02/16 00:14:52 == 29 == ERRORMSG Can't use an undefined value as
+a HASH reference at Value.pm line 77.
=cut
__DATA__
[2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr
+ice table
Please check
Valueprice.pm line 52.
[2012/02/16 00:08:34] [39] ERRORMSG Invalid User
[2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate
[2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a
+HASH reference at Value.pm line 77.
Update:
I suppose the first two little regex's in the map could be replaced with a single tr
@data = map{ tr/\n[/ /d; s/\]/ ==/g; $_}@data;
tr is faster than regex because it is "lighter weight" meaning "dumber". It cannot substitute one character into two. But in this case performance appears not to be a significant factor - or at least that is not mentioned in the requirements.
My personal advice on parsing very regular program generated things like log files is to keep the regex complexity as low as possible - make it just as complicated as it needs to be and no more. If you are validating "user input" then the complexity level has to be more. | [reply] [d/l] [select] |
|
|
my $in;
{
print "Input 'foo': "; chomp($in=<STDIN>); redo unless /^foo$/;
}
--
I hate storms, but calms undermine my spirits.
-- Bernard Moitessier, "The Long Way"
| [reply] [d/l] |
|
|
I don't think that we need to get into a big discussion in the context of this thread.
Part of what I'm saying is that with:
[2012/02/16 00:08:34] [29] ERRORMSG unknown error
There is no reason or need to parse the date time format with some huge regex eg:
m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(\d{1,3})\]
If the line begins with "[" it is a date/time and there is no reason to parse or otherwise try to understand it. Maybe this changes to YYYY-MM-DD or YYYY.MM.DD instead of YYYY/MM/DD? In the context of this re-formatting program, it shouldn't matter.
Basically, if a complex regex is not essential to the program operation, don't even do that. Here all that is needed is to understand that the square brackets on the first part of a line signifies a "new record". Past that, the parser shouldn't care about the format between the square brackets, because it doesn't need to do that in order to do its job!
Maybe we are actually in agreement here? ^[...] starts a new "message line" and that is all we need to know - that is considered "valid input" no matter what is between the [...].
| [reply] [d/l] [select] |
Re: Not able to capture information
by Don Coyote (Hermit) on Feb 17, 2012 at 07:50 UTC
|
To append the orphaned lines I haves set up an array that can be be manipulated during the while sequence, which is then printed after processing.
This appends the ophaned lines as in the case provided
#!/usr/bin/perl -w
use strict;
my @linearr;
while (<DATA>) {
chomp;
if($_ =~ m{\[(\d{4}/\d{2}/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(
+\d{1,3})\]\s+ERRORMSG\s+(.*)}) {
my $date = $1;
my $err_no = $2;
my $err_msg = $3;
push @linearr, $date.' === '.$err_no.' === '.$err_msg."\n";
}else{ $linearr[@linearr-1] =~ s/\n$/\ $_\n/;}
}
print @linearr;
prints
__DATA__
[2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr
+ice table Please check Valueprice.pm line 52.
[2012/02/16 00:08:34] [39] ERRORMSG Invalid User
[2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate
[2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a
+HASH reference at Value.pm line 77.
Coyote | [reply] [d/l] [select] |
|
|
Yes, yet another road to Rome!
I would have written the code very slightly differently.
(1) Rather than using $1,$2,$3, I would use list assignment of the variables. The match "worked" if the last one is "defined".
(2) A complex regex of the date/time is not needed
(3) In the substitution, I would use "|" as the separator to reduce the number of "leaning toothpicks" although some folks figure that this is a bad idea. mileage varies.
#!/usr/bin/perl -w
use strict;
my @lines;
while (<DATA>)
{
chomp;
next if /^\s*$/;
my ($date, $err_no, $err_msg) =
m{\[(.*)\]\s+\[(.*)\]\s+ERRORMSG\s+(.*)};
if (defined $err_msg) # the match "worked"!
{
push @lines, $date.' === '.$err_no.' === '.$err_msg."\n";
}
else
{
$lines[@lines-1] =~ s|\n$| $_\n|;
}
}
print @lines;
=prints
2012/02/16 00:08:34 === 29 === unknown error Can't insert into price t
+able Please check Valueprice.pm line 52.
2012/02/16 00:08:34 === 39 === Invalid User
2012/02/16 00:14:52 === 105 === missing conversion rate
2012/02/16 00:14:52 === 29 === Can't use an undefined value as a HASH
+reference at Value.pm line 77.
=cut
__DATA__
[2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr
+ice table
Please check
Valueprice.pm line 52.
[2012/02/16 00:08:34] [39] ERRORMSG Invalid User
[2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate
[2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a
+HASH reference at Value.pm line 77.
| [reply] [d/l] |
|
|
List assignment, also another good modification overlooked here. The difference between them being that undefined scalars are created, possibly unnecessarily, before each regexp test. Where in scalar assignment the match will have been tested before scalars are created. No biggie, but how would we go about making comparisons for such details? I would like to think on.
I did consider amending toothpicks in the original regexp. But for time and the regexp was already dealt with by first response. I did not mind for my substitution as was a very short substition.
Pipe is syntactically correct, but due to it's general usage I would probably pick a different symbol. Each to their own here.
| [reply] |
|
|