dbrock has asked for the wisdom of the Perl Monks concerning the following question:

Hello...
I am working on a script to report errors from a backup tool... I have all of the errors in one comma-delimited file... Is there a slick way to parse the lines and group the data, so that I can write them to seperate files... corresponding to the NODE NAME (the first comma-delimit i.e. XXXYYZZ,)
My example file is generated from a SQL query and placed in comma-delimited form... The comma delimited file has three data sources (node,datestamp,error)...

ABHS00001,2004-01-16 01:43:24.000000,ANE4987E Error processing
ABHS00001,2004-01-16 01:46:24.000000,ANE4987E Error processing
ABHS00001,2004-01-16 01:49:24.000000,ANE4987E Error processing
AH2D21001,2004-01-15 22:57:32.000000,ANE4987E Error processing
AH2D21001,2004-01-15 22:57:33.000000,ANE4987E Error processing
AH2D21001,2004-01-15 22:57:34.000000,ANE4987E Error processing
AH2S21003,2004-01-16 02:23:05.000000,ANE4987E Error processing
AH2S21003,2004-01-16 02:24:05.000000,ANE4987E Error processing
AH2S21003,2004-01-16 02:24:05.000000,ANE4987E Error processing
ESI2A55P,2004-01-16 04:21:43.000000,ANE4037E File Skipped
ESI2A55P,2004-01-16 04:25:43.000000,ANE4037E File Skipped
ESI2A55P,2004-01-16 04:27:43.000000,ANE4037E File Skipped


Any assistance would br great
Darrick...

Replies are listed 'Best First'.
Re: Assistance with Comma parse
by pg (Canon) on Jan 21, 2004 at 06:27 UTC

    Lots of time, a good data structure is the key. In this case, use a HOA to group nodes.

    use Data::Dumper; use strict; use warnings; open DATA, "<", "foo.dat"; my $data = {}; while (my $line = <DATA>) { my @columns = split /,/, $line; push @{$data->{$columns[0]}}, [$columns[1], $columns[2]]; } close DATA; print Dumper($data);

    With the sample data you give, here is the output:

    $VAR1 = { 'AH2S21003' => [ [ '2004-01-16 02:23:05.000000', 'ANE4987E Error processing ' ], [ '2004-01-16 02:24:05.000000', 'ANE4987E Error processing ' ], [ '2004-01-16 02:24:05.000000', 'ANE4987E Error processing ' ] ], 'AH2D21001' => [ [ '2004-01-15 22:57:32.000000', 'ANE4987E Error processing ' ], [ '2004-01-15 22:57:33.000000', 'ANE4987E Error processing ' ], [ '2004-01-15 22:57:34.000000', 'ANE4987E Error processing ' ] ], 'ESI2A55P' => [ [ '2004-01-16 04:21:43.000000', 'ANE4037E File Skipped ' ], [ '2004-01-16 04:25:43.000000', 'ANE4037E File Skipped ' ], [ '2004-01-16 04:27:43.000000', 'ANE4037E File Skipped' ] ], 'ABHS00001' => [ [ '2004-01-16 01:43:24.000000', 'ANE4987E Error processing ' ], [ '2004-01-16 01:46:24.000000', 'ANE4987E Error processing ' ], [ '2004-01-16 01:49:24.000000', 'ANE4987E Error processing ' ] ] };

      That's a good way to store the data structure (I was going to reply and suggest a hashref of arrayrefs). I'd just like to add a caveat: be careful, slurping an entire file into memory can be a bad idea if the file is too big -- especially if you're going to manipulate the data.

      A better bet may be to create an array of filehandles, append the data to each filehandle, and the reread in to reduce the total amount of memory you're going to need.

      -Dan
Re: Assistance with Comma parse
by duff (Parson) on Jan 21, 2004 at 05:51 UTC

    Um ... I think this is what you're asking for ...

    while (<>) { my ($node) = split /,/; open $node, ">>$node" or die "a horrible death!"; print $node $_; close $node; }

    There are slicker ways though. Other than opening a file each time, you could cache the opened filehandles and just write to the already opened ones (mayn't be too useful if you've got many many files to open) Here's some code (untested, caveat lector):

    my %cache; while (<>) { my ($node) = split /,/; unless ($cache{$node}) { open $node,">>$node" or die "a horrible death"; $cache{$node}++; } print $node $_; } for my $n (keys %cache) { close $n; }
Re: Assistance with Comma parse
by Abigail-II (Bishop) on Jan 21, 2004 at 12:31 UTC
    Assuming the data is sorted on the first column, and read from <>, I'd write it as:
    #!/usr/bin/perl use strict; use warnings; my $file = ""; my $fh; while (<>) { my ($node, $mess) = split /,/ => $_, 2; unless ($node eq $file) { close $fh or die "close $file: $!" if $fh; $file = $node; open $fh => '>', $file or die "open $file: $!"; } print $fh $mess; } __END__

    Or from the command line:

    perl -apF, -e'if($F ne$F[0]){$F=$F[0];open F,">$F"or die;select F}s/[^ +,]+//'

    Abigail

Re: Assistance with Comma parse
by BUU (Prior) on Jan 21, 2004 at 10:13 UTC
    Why don't you just do the sql queries from perl and deal with the data in it's original form in stead of mucking about doing comma parsing.

      I have to agree.

      dbrock, is there a reason you're dumping the select to a file first?

Re: Assistance with Comma parse
by captain_haddock (Novice) on Jan 22, 2004 at 13:49 UTC
    Assuming that the data is sorted on the first field, how about:
    while(<>) { my @a = split /,/; $files{pop @a} = @a; }
Re: Assistance with Comma parse
by Anonymous Monk on Jan 22, 2004 at 14:48 UTC
    I use the same backup product that initially produced this file.. Instead of pulling stuff from the actlog directly with sql, use the "q actlog" command, and then put it out to a file with the -comma argument. Then, here's your code to make some sense of the jumble. it's also good just to dump the actlog because there is a lot of other relevant info in there that you'll be wanting sooner or later. err.. better yet, just pull it down from my.adsm.org, it's the daily backup status tool, I put a new one up there a few days ago, I don't know if it's made it all the way to to my.adsm.org yet, In the event that you have some objection to that site, you can pull it straight from my website. http://www.warispeace.org/tsm/dailybackupstatus.tar.gz or if you want the new one ( and you probably do, it is much improved ) http://www.warispeace.org/tsm/new-dbs.tar.gz at least it will dump a LOT of relevant info into your DB of choice ( it's setup for mysql, but it uses DBI, so you can change it whatever you prefer ). Enjoy!