http://qs1969.pair.com?node_id=550535

jstout13 has asked for the wisdom of the Perl Monks concerning the following question:

I need help in extracting specific data from log files. The log file has 51 fields I only need 11 fields from that file. I need to "open" to a new file output with only the data from the 11 fields. The new file will be specific to the customers. I have 10-20 customers in each log file I need to create a new log file per customer with only the pertinent data. Here is what I have so for, any suggestions or help with this is thoroughly appreciated. Jeff Stout
#!/usr/bin/perl -w # # parselog.pl - Script to split log files by customer # use warnings; my $logfile; # input log file my $media; # media type my %cust; # hash of all customers found in $logfile my $thiscust; # customer derived from current input line my @fields; # input logfile fields my $thiscustlog; # file handle to current customer's logfile my $prefix; # cust logfile prefix my $suffix; # cust logfile suffix $logfile = $ARGV[0]; if ( $logfile =~ /WMS/ ) { $media = "wms"; $prefix = "/usr/home/script/${media}_cust/"; $suffix = ".$media.log"; } else { die "Unknown file format"; } open (LOGFILE, "< $logfile") or die ("Could not open $logfile."); while (<LOGFILE>) { if ( /^[0-9]/ ) { @fields = split; $thiscust = (split /\//, $fields[4])[1]; if ( ! exists ($cust{$thiscust}) ) { print "customer $thiscust\n"; $cust{$thiscust} = $thiscust; open ($thiscust, ">> ${prefix}${thiscust}${suf +fix}") or die ("Cannot open ${prefix}${thiscust}${ +suffix}"); } print $thiscust $_; } } foreach $thiscust (%cust) { close $thiscust; } exit 0;

Replies are listed 'Best First'.
Re: Log File Parsing using "split"
by McDarren (Abbot) on May 19, 2006 at 16:46 UTC
    Okay, first of all it would be really useful if you could add a half-dozen lines of sample data to your post. This will help people to visualise your problem a bit better.

    Now...

    "..log file has 51 fields I only need 11 fields.."

    Which 11 fields? It's a bit difficult to tell from the code you posted.

    Because you have @fields = split;, am I right in assuming that the data is whitespace delimited?

    Other than that, I'd make no further comment without seeing some sample data, and perhaps an example of the expected output.

    Cheers,
    Darren :)

    PS: .... use strict

Re: Log File Parsing using "split"
by ruzam (Curate) on May 19, 2006 at 17:23 UTC
    Suggestions:

    Provide more information with $! when open fails (you'll apreciate it when it happens).
    open (LOGFILE, "< $logfile") or die ("Could not open $logfile: $!");
    You're kind of care free when mixing $thiscust with record fields, hash keys and file handles. Maybe something more like this:
    $thiscust = (split /\//, $fields[4])[1]; if ( ! $cust{$thiscust} ) { print "customer $thiscust\n"; $cust{$thiscust} = $thiscust; open (my $fh, ">> ${prefix}${thiscust}${suffix}") or die ("Cannot open ${prefix}${thiscust}${suffix}: $!"); $cust{$thiscust} = $fh; } my $fh = $cust{$thiscust}; print $fh $_;
      open (my $fh, ">> ${prefix}${thiscust}${suffix}")
      Why do you concatenate the second and third arguments to open() into a string just so that perl can pull it appart again?

      open (my $fh, '>>', "$prefix$thiscust$suffix")
        Old Habit :)
Re: Log File Parsing using "split"
by ww (Archbishop) on May 19, 2006 at 17:07 UTC
    Definitely not in order of importance, and NOT to excoriate you (in fact, perhaps betraying bits 'n pieces of mine own ignorance), but
    • Why use "-w" and "use warnings;" and why NOT "use strict;" Hint: you'll find a wealth of information by simply searching or supersearching.
    • Naming of variables: "$media" for example -- ah, paper vs. canvas, vs. one-inch-tape vs. 8 inch floppies? BRRRRRT! Suggestion: use accurately meaningful var names.
    • "Unknown file format" -- I don't think it's going to be an "unknown file format" that sends flow to the die; I think it's going to be a failure to match a string which includes, somewhere within it, the characters "WMS." Recommendation: make your messages mean what they say and say what they mean!

    What this smells like is cargo culting and cut-and-pasting without a clear understanding of what the ancestor code did or how it worked.

    And, for us to help, we surely need a somewhat clearer exposition -- a snippet, sample -- of the logfile whose format is unknown to us.

    Please UPDATE your question... and please read How do I post a question effectively? both of which will help us to help you.

Re: Log File Parsing using "split"
by SamCG (Hermit) on May 19, 2006 at 18:13 UTC
    Well, kind of minor, but:

    Regarding
    if ( $logfile =~ /WMS/ ) { $media = "wms"; $prefix = "/usr/home/script/${media}_cust/"; $suffix = ".$media.log"; } else { die "Unknown file format"; }
    So, if it matches WMS, you set some variables, if not you die. Why, then, not just set the variables, then check the match and die? It's not (wholly) just golfing, I think it makes it a bit more logically clear.
    $media='wms'; $prefix = "/usr/home/script/${media}_cust/"; $suffix = ".$media.log"; die "logfile name does not contain 'WMS'\n" if $logfile !~ /WMS/;
    Also, is it now printing each full record whenever it starts with a digit? Or rather, is that what you're trying to do? I agree that you seem to be using $thiscust for multiple purposes -- once as the customer id and once as a file hande, and this seems unadvised.



    -----------------
    s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
Re: Log File Parsing using "split"
by TedPride (Priest) on May 19, 2006 at 22:13 UTC
    On a hopefully related note, you can assign your fields to variables all at once using a format like the following:
    ($name, $city, $state, $zip) = (split / /, $_)[4,5,6,8];
      ($name, $city, $state, $zip) = (split / /, $_)[4,5,6,8];

      I think the OP is probably with the default behavioiur of split() - nothing to be gained by spelling it out.

      It would be very unusal in real code that in an assignement statement such as the one above you would really want to overwrite the values of four existing variables. It would be far more common that this is the point in the code at which these four variables would be introduced.

      Newcommers to Perl often have problems with variable declaration. When presenting issolated code fragments all assignment statements should have a my() if is more likely than not that they would need one in any well-written real code.

      my ($name, $city, $state, $zip) = (split)[4,5,6,8];