freddiel has asked for the wisdom of the Perl Monks concerning the following question:

How do I create a script that can gather certain data from a text logfile with fixed columns?

Here is an excerpt of logfile:

File Name Time Stamp + Total Records Processed Errors -------------------------------------------------- ------------------- +- -------------------- -------------------- -------------------- N-20070143.003.TXT 05/23/07 02:36:59 P +M 13 13 0 N-20070143.004.TXT 05/23/07 04:48:56 P +M 1 1 0 N-20070143.006.TXT 05/23/07 04:48:56 P +M 16 16 0 N-20070143.008.TXT 05/23/07 04:48:58 P +M 19 19 0 N-20070143.009.TXT 05/23/07 04:48:59 P +M 1 1 0 N-20070143.010.TXT 05/23/07 04:49:00 P +M 5 4 1 N-20070143.012.TXT 05/23/07 04:49:00 P +M 18 18 0 N-20070143.013.TXT 05/23/07 04:49:02 P +M 20 20 0 N-20070143.015.TXT 05/23/07 04:49:03 P +M 53 53 0 N-20070143.011.TXT 05/24/07 04:35:48 P +M 5 5 0 N-20070152.040.TXT 06/18/07 04:03:26 P +M 25 21 4
Any information appreciated.

Thanks in advance.

Freddie

Replies are listed 'Best First'.
Re: How to monitor a logfile with columns for certain data?
by gamache (Friar) on Nov 16, 2007 at 16:39 UTC
    Having worked with a lot of fixed-length data (Fortran output), I've done it like this:
    #!/usr/bin/perl use strict; use warnings; <DATA>; <DATA>; # skip header lines while (<DATA>) { my @fields = / (.{51}) (.{21}) (.{21}) (.{21}) (.+) /x or next; my @fields_nospace = map {/^\s*(.+?)\s*$/; $1} @fields; print join ("\t", @fields_nospace), "\n"; } __DATA__ (your data here)
    Output:
    N-20070143.003.TXT 05/23/07 02:36:59 PM 13 13 0 N-20070143.004.TXT 05/23/07 04:48:56 PM 1 1 0 N-20070143.006.TXT 05/23/07 04:48:56 PM 16 16 0 N-20070143.008.TXT 05/23/07 04:48:58 PM 19 19 0 N-20070143.009.TXT 05/23/07 04:48:59 PM 1 1 0 N-20070143.010.TXT 05/23/07 04:49:00 PM 5 4 1 N-20070143.012.TXT 05/23/07 04:49:00 PM 18 18 0 N-20070143.013.TXT 05/23/07 04:49:02 PM 20 20 0 N-20070143.015.TXT 05/23/07 04:49:03 PM 53 53 0 N-20070143.011.TXT 05/24/07 04:35:48 PM 5 5 0 N-20070152.040.TXT 06/18/07 04:03:26 PM 25 21 4
      Ok, So how do I retrieve the column that has Header of Errors and get the line that has Errors > 0 or Processed <>Total Records? Any information would be greatful. I am not very good at perl. Thanks.
        Each field is an element in the @fields_nospace array; the first field (File Name) is at $fields_nospace[0], and the fifth field (Errors) is at $fields_nospace[4]. So you can check if $fields_nospace[4] > 0, or if $fields_nospace[2] != $fields_nospace[3], and do things based on those conditions. Hope this helps.
Re: How to monitor a logfile with columns for certain data?
by thezip (Vicar) on Nov 17, 2007 at 01:37 UTC

    Hello freddie,

    There is also a solution by using unpack:

    #!/usr/local/bin/perl -w use strict; use Data::Dumper; my $spec = 'A24A9A15A8A8A8'; my $line = scalar <DATA>; my @arr = unpack($spec, $line); print Dumper(\@arr); __DATA__ N-20070143.003.TXT 05/23/07 02:36:59 PM 13 13 0 __OUTPUT__ $VAR1 = [ 'N-20070143.003.TXT', '05/23/07', '02:36:59 PM', '13', '13', '0' ];

    Where do you want *them* to go today?
Re: How to monitor a logfile with columns for certain data?
by aquarium (Curate) on Nov 16, 2007 at 23:58 UTC
    although your input looks like fixed width, it may be tab separated. in any case, as long as columns don't run into each other, you can use the whitespace to delimit fields on input. Furthermore, to make the work of selecting on values in columns....rather than implementing this yourself...try a module like DBD::CSV or such. then you merely write SQL to select what you want and process it.
    the hardest line to type correctly is: stty erase ^H