monger has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monestarians, I am working on parsing a log file. Here's a snippet:
6/13/2005 5:59:57 PM 10.1.1.2 WARNING Jun 13 2005 22:00:04: %PIX-4-106023: Deny udp src aliens:192.168.1.35/1148 dst dmz:10.10.10.32/1434 by access-group "aliens"

What I am doing is using perl, and likely awk, for normalization for a DB project. First, I'm stripping off everything up through "WARNGING" - DONE.
Next, I need to adjust the date, converting the month to it's numerical. I'm using a hash now; if there's a better way, I'm all ears.
Finally, I need to split, or remove, the tags before each IP, and also convert the "/port number" to a "space" "port number"
Here's the code:

use Data::Dumper; my %date_hash = ( "Jan" => "01", "Feb" => "02", "Mar" => "03", "Apr" => "04", "May" => "05", "Jun" => "06", "Aug" => "07", "Sep" => "08", "Oct" => "10", "Nov" => "11", "Dec" => "12" ); my $file=".\\tmp.txt"; my $out=".\\out.txt"; open FILE, $file || die "Can't open file: $!"; open OUT, ">$out" || die "Can't open file: $!"; while(<FILE>){ (m/(.+.WARNING)(.+)/g) while ( (my $key, my $value) = each(%date_hash)) { if (s/$key/$value/g) { } } } } close FILE; print Dumper($2); close OUT;
Without the hash search and replace, the entire log entry minus everything up through "WARNING" is in $2. I now need to operate on the contents of $2, or find a different way to pass the file down through the different filters.

Peace and Pleasure in Perl coding,
monger

Monger +++++++++++++++++++++++++ Munging Perl on the side

Replies are listed 'Best First'.
Re: Multiple Search and Replaces on one file
by holli (Abbot) on Jun 15, 2005 at 15:18 UTC
    use strict; use warnings; my %date_hash = ( "Jan" => "01", "Feb" => "02", "Mar" => "03", "Apr" => "04", "May" => "05", "Jun" => "06", "Aug" => "07", "Sep" => "08", "Oct" => "10", "Nov" => "11", "Dec" => "12" ); while(<DATA>) { #stripoff everything up through "WARNGING" #adjust the date, converting the month to it's numerical. s/.+WARNING ([A-Za-z]{3})/$date_hash{$1}/; # remove, the tags before each IP, and also convert the "/port num +ber" to a "space" "port number" #I'm not sure about that part s/[^:]+:(\d+\.\d+\.\d+\.\d+)\/(\d+)/$1 $2:/g; print; } __DATA__ 6/13/2005 5:59:57 PM 10.1.1.2 WARNING Jun 13 2005 22:00:04: %PIX-4-106 +023: Deny udp src aliens:192.168.1.35/1148 dst dmz:10.10.10.32/1434 b +y access-group "aliens"
    produces
    06 13 2005 22:00:04: %PIX-4-106023:192.168.1.35 1148:10.10.10.32 1434: + by access-group "aliens"
    Close?


    holli, /regexed monk/
Re: Multiple Search and Replaces on one file
by ww (Archbishop) on Jun 15, 2005 at 15:55 UTC
    output:
    syntax error at dumpop.pl line 23, near ") {" syntax error at dumpop.pl line 26, near "}" dumpop.pl had compilation errors.
    so...
    • use strict;
    • use warnings;

    and then there's your match

    m/(.+.WARNING)(.+)/g
    which has me scratching my head in bewilderment. Could this be what you meant?
    m/(?:.+)(?=WARNING)WARNING\s(.+)

    (in which case $1 is the capture.)

      m/(?:.+)(?=WARNING)WARNING\s(.+)

      Out of curiosity, why the double WARNING?

      and then there's your match
      m/(.+.WARNING)(.+)/g

      I arrived at this via trial and error. First, I tried \w, but there are spaces. I then went to ".", but had to add the greedy +. So, I'm up to
      m/(.+WARNING)...

      It still didn't match, so I added the second "." to get what you see. The parens are there to grab only what's after WARNING, regardless of what it is, hence (.+).

      monger

      Monger +++++++++++++++++++++++++ Munging Perl on the side
        monger:

        You said you don't want the first set of date/time info, nor the "WARNING" ... so why capture them at all? Use non_captureing parens,     (:...) for that which you are using merely as a marker for the string you want.

        Please read the two items surrounding kaif's question... (Aargh; seem to have sent reply to kaif's to bitbucket, so repeating below) and more particularly (or, at least, even more diligently) the suggestions from holli, et al.

        Update:

         (?=WARNING) is a lookahead (aka lookaround), which

        • matches at the POSITION or "location" before "WARNING" (ie, uses "WARNING" as a marker to delimit the prior part(s) of the regex; you can think of this as a way to limit greediness, tho that's very sloppy language) but...
        • does not capture "WARNING"!
        so...

        the second "WARNING" in the regex matches the word itself which therefore gets included (along with the trailing \s, space) in the NON_capture,  (?:...)

        HTH
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Multiple Search and Replaces on one file
by davidrw (Prior) on Jun 15, 2005 at 15:15 UTC
    Two suggestions -- one is to restructure the hash search and replace a little for less iterations, and the other is to split on "WARNING" instead of just matching, so you can work on the pieces separately.
    my $keysForRE = join '|', keys %date_hash; while(<FILE>){ s/\b($keysForRE)\b/$date_hash{$1}/eg; my ($piece1, $piece2) = split /\bWARNING\b/, $_, 2; # do stuff to $piece1 # do stuff to $piece2 $_ = join 'WARNING', ($piece1, $piece2); }
      The OP wants to completely remove everything up to WARNING so your join() would not be neccessary.
Re: Multiple Search and Replaces on one file
by Elijah (Hermit) on Jun 15, 2005 at 15:35 UTC
    I would do something like the foloowing:
    my $file = 'tmp.txt'; my $out = 'out.txt'; open(FILE, "<", $file) || die "Can't open $file: $!\n"; open(OUT, ">", $out) || die "Can't open $out: $!\n"; while(<FILE>){ s/(\d+\/\d+\/\d+).*?(%PIX.+)/$1 $2/g; s/\// /g; s/\w+\:(\d+\.\d+\.\d+\.\d+)/$1/g; print OUT $_; } close FILE; close OUT;

    Avoiding the hash and using the numeric date already in each log entry is probably the best way.

    Edit: Typo corrected.

      ...but in his example line the first date-time string and the second are -clearly- different. The month happens to be the same in that example. You are taking the date from the first stamp and the time from the second.

      Also there is a missing '.' from your first regex.

        Your making an assumption that each log entry does not have the same syntax. I am making the assumption that it does because that is what the OP's example shows. Where do you get your info from?

        Also I am not missing anything from my regexp. Maybe you think my example was trying to match something it was not.
Re: Multiple Search and Replaces on one file
by robot_tourist (Hermit) on Jun 16, 2005 at 14:56 UTC

    Just on the hash=>month thing, it could be annoying to have to maintain it if the names changed :)

    Apart from that, I don't think there are any issues doing that if this is a one-off, but afaik there are lots of date/time modules to do the conversion if installing/use-ing a module is worth the extra work/computer resources.

    How can you feel when you're made of steel? I am made of steel. I am the Robot Tourist.
    Robot Tourist, by Ten Benson