Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,
I would like to read the details under __DATA__ and its not capturing complete information using below code
#!/usr/bin/perl while(<DATA>) { chomp($_); if($_ =~ m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\ +[(\d{1,3})\]\s+ERRORMSG\s+(.*)/) { my $date = $1; my $err_no = $2; my $err_msg = $3; print "$date === $err_no === $err_msg\n"; } } __DATA__ [2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr +ice table Please check Valueprice.pm line 52. [2012/02/16 00:08:34] [39] ERRORMSG Invalid User [2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate [2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a +HASH reference at Value.pm line 77.

The above code print output as follows
2012/02/16 00:08:34 === 29 === unknown error Can't insert into price +table 2012/02/16 00:08:34 === 39 === Invalid User 2012/02/16 00:14:52 === 105 === missing conversion rate 2012/02/16 00:14:52 === 29 === Can't use an undefined value as a HASH + reference at Value.pm line 77.
But I am looking for answer like ... need to append "Please check Valueprice.pm line 52." along with ERRORMSG unknown error Can't insert into price table.
2012/02/16 00:08:34 === 29 === unknown error Can't insert into price +table Please check Valueprice.pm line 52. 2012/02/16 00:08:34 === 39 === Invalid User 2012/02/16 00:14:52 === 105 === missing conversion rate 2012/02/16 00:14:52 === 29 === Can't use an undefined value as a HASH + reference at Value.pm line 77.

Could you please help me to modify the existing code.
Thanks

Replies are listed 'Best First'.
Re: Not able to capture information
by kcott (Archbishop) on Feb 17, 2012 at 06:33 UTC

    This modification to your code:

    use strict; use warnings; while(<DATA>) { chomp($_); if ($_ =~ m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(\ +d{1,3})\]\s+ERRORMSG\s+(.*)/) { my $date = $1; my $err_no = $2; my $err_msg = $3; if ($. > 1) { print qq{\n}; } print "$date === $err_no === $err_msg"; } else { print qq{ $_}; } } print qq{\n};

    produces this output:

    ken@ganymede: ~/tmp $ pm_multiline_regex.pl 2012/02/16 00:08:34 === 29 === unknown error Can't insert into price +table Please check Valueprice.pm line 52. 2012/02/16 00:08:34 === 39 === Invalid User 2012/02/16 00:14:52 === 105 === missing conversion rate 2012/02/16 00:14:52 === 29 === Can't use an undefined value as a HASH + reference at Value.pm line 77. ken@ganymede: ~/tmp $

    -- Ken

      I could see this required an if else statement but not how to prevent requiring data storage during the subroutine, that is, using arrays (see my comment below). So simple control of the \n character at the start of a line before flow control kicks in can save a lot of cpu. Rather than at the end where you need to hold the line while the data is fed in to find out if the next line is a match or not.

      Drat, I was just starting to enjoy my array solutions.

        Given he probably wants to print to a log file, you could remove the data storage overhead by changing

        my @linearr; ... print @linearr;

        to

        use Tie::File; tie my @linearr, 'Tie::File', 'noa.log' or die $!; ... untie @linearr;

        You still might want to tweak the internals of the loop.

        -- Ken

Re: Not able to capture information
by oko1 (Deacon) on Feb 17, 2012 at 06:29 UTC

    Man, that's one ugly regex. And I say that as a guy who's written a lot of ugly regexes. :)

    I _think_ (kinda hard to tell from your misformatted "desired answer" line) you're looking for something like this:

    #!/usr/bin/perl use common::sense; my $data = do { local $/; <DATA>; }; $data =~ s/\n(?!\[)/ /gs; for (split /\n/, $data){ my @line = split /[\[\] ]+|ERRORMSG /, $_, 6; print join(" === ", @line[1..3,5]), "\n"; } __DATA__ [2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr +ice table Please check Valueprice.pm line 52. [2012/02/16 00:08:34] [39] ERRORMSG Invalid User [2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate [2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a +HASH reference at Value.pm line 77.

    Prints:

    2012/02/16 === 00:08:34 === 29 === unknown error Can't insert into pri +ce table Please check Valueprice.pm line 52. 2012/02/16 === 00:08:34 === 39 === Invalid User 2012/02/16 === 00:14:52 === 105 === missing conversion rate 2012/02/16 === 00:14:52 === 29 === Can't use an undefined value as a H +ASH reference at Value.pm line 77.

    Is that what you're looking for?

    Update: Whoops - I think I just figured out what the OP is asking... so there are two problems in his code. Revised solution.

    -- 
    I hate storms, but calms undermine my spirits.
     -- Bernard Moitessier, "The Long Way"
Re: Not able to capture information
by Marshall (Canon) on Feb 17, 2012 at 06:51 UTC
    This idea didn't work out as well as I thought it would, but I will post for entertainment value. There are a lot of ways to skin these cats...
    #!/usr/bin/perl -w use strict; my @data = do{local $/ = "\n["; (<DATA>)}; @data = map{ s/\n/ /g; s/\[//g; s/\]/ ==/g; $_}@data; print join "\n", @data; =prints 2012/02/16 00:08:34 == 29 == ERRORMSG unknown error Can't insert into +price table Please check Valueprice.pm line 52. 2012/02/16 00:08:34 == 39 == ERRORMSG Invalid User 2012/02/16 00:14:52 == 105 == ERRORMSG missing conversion rate 2012/02/16 00:14:52 == 29 == ERRORMSG Can't use an undefined value as +a HASH reference at Value.pm line 77. =cut __DATA__ [2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr +ice table Please check Valueprice.pm line 52. [2012/02/16 00:08:34] [39] ERRORMSG Invalid User [2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate [2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a +HASH reference at Value.pm line 77.
    Update:
    I suppose the first two little regex's in the map could be replaced with a single tr
    @data = map{ tr/\n[/ /d;  s/\]/ ==/g;  $_}@data;
    tr is faster than regex because it is "lighter weight" meaning "dumber". It cannot substitute one character into two. But in this case performance appears not to be a significant factor - or at least that is not mentioned in the requirements.

    My personal advice on parsing very regular program generated things like log files is to keep the regex complexity as low as possible - make it just as complicated as it needs to be and no more. If you are validating "user input" then the complexity level has to be more.

      My 2 cents on your 2 cents: validating user input is very simple. Never try to "enumerate badness"; just define what is valid and reject everything else.

      my $in; { print "Input 'foo': "; chomp($in=<STDIN>); redo unless /^foo$/; }
      -- 
      I hate storms, but calms undermine my spirits.
       -- Bernard Moitessier, "The Long Way"
        I don't think that we need to get into a big discussion in the context of this thread.

        Part of what I'm saying is that with:
        [2012/02/16 00:08:34] [29] ERRORMSG unknown error

        There is no reason or need to parse the date time format with some huge regex eg:
         m/\[(\d{4}\/\d{2}\/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[(\d{1,3})\]

        If the line begins with "[" it is a date/time and there is no reason to parse or otherwise try to understand it. Maybe this changes to YYYY-MM-DD or YYYY.MM.DD instead of YYYY/MM/DD? In the context of this re-formatting program, it shouldn't matter.

        Basically, if a complex regex is not essential to the program operation, don't even do that. Here all that is needed is to understand that the square brackets on the first part of a line signifies a "new record". Past that, the parser shouldn't care about the format between the square brackets, because it doesn't need to do that in order to do its job!

        Maybe we are actually in agreement here?
        ^[...] starts a new "message line" and that is all we need to know - that is considered "valid input" no matter what is between the [...].

Re: Not able to capture information
by Don Coyote (Hermit) on Feb 17, 2012 at 07:50 UTC

    To append the orphaned lines I haves set up an array that can be be manipulated during the while sequence, which is then printed after processing.

    This appends the ophaned lines as in the case provided

    #!/usr/bin/perl -w use strict; my @linearr; while (<DATA>) { chomp; if($_ =~ m{\[(\d{4}/\d{2}/\d{2}\s+\d{2}\:\d{2}\:\d{2})\]\s+\[( +\d{1,3})\]\s+ERRORMSG\s+(.*)}) { my $date = $1; my $err_no = $2; my $err_msg = $3; push @linearr, $date.' === '.$err_no.' === '.$err_msg."\n"; }else{ $linearr[@linearr-1] =~ s/\n$/\ $_\n/;} } print @linearr;

    prints

    __DATA__ [2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr +ice table Please check Valueprice.pm line 52. [2012/02/16 00:08:34] [39] ERRORMSG Invalid User [2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate [2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a +HASH reference at Value.pm line 77.

    Coyote

      Yes, yet another road to Rome!

      I would have written the code very slightly differently.
      (1) Rather than using $1,$2,$3, I would use list assignment of the variables. The match "worked" if the last one is "defined".
      (2) A complex regex of the date/time is not needed
      (3) In the substitution, I would use "|" as the separator to reduce the number of "leaning toothpicks" although some folks figure that this is a bad idea. mileage varies.

      #!/usr/bin/perl -w use strict; my @lines; while (<DATA>) { chomp; next if /^\s*$/; my ($date, $err_no, $err_msg) = m{\[(.*)\]\s+\[(.*)\]\s+ERRORMSG\s+(.*)}; if (defined $err_msg) # the match "worked"! { push @lines, $date.' === '.$err_no.' === '.$err_msg."\n"; } else { $lines[@lines-1] =~ s|\n$| $_\n|; } } print @lines; =prints 2012/02/16 00:08:34 === 29 === unknown error Can't insert into price t +able Please check Valueprice.pm line 52. 2012/02/16 00:08:34 === 39 === Invalid User 2012/02/16 00:14:52 === 105 === missing conversion rate 2012/02/16 00:14:52 === 29 === Can't use an undefined value as a HASH +reference at Value.pm line 77. =cut __DATA__ [2012/02/16 00:08:34] [29] ERRORMSG unknown error Can't insert into pr +ice table Please check Valueprice.pm line 52. [2012/02/16 00:08:34] [39] ERRORMSG Invalid User [2012/02/16 00:14:52] [105] ERRORMSG missing conversion rate [2012/02/16 00:14:52] [29] ERRORMSG Can't use an undefined value as a +HASH reference at Value.pm line 77.

        List assignment, also another good modification overlooked here. The difference between them being that undefined scalars are created, possibly unnecessarily, before each regexp test. Where in scalar assignment the match will have been tested before scalars are created. No biggie, but how would we go about making comparisons for such details? I would like to think on.

        I did consider amending toothpicks in the original regexp. But for time and the regexp was already dealt with by first response. I did not mind for my substitution as was a very short substition.

        Pipe is syntactically correct, but due to it's general usage I would probably pick a different symbol. Each to their own here.