windperl has asked for the wisdom of the Perl Monks concerning the following question:

Hi folks, I am new user of perl and I need to parse many similar csv file to produce one output file. The input files are like systot-all_*.csv where * may represent any thing. The structure of each file is exactly the same. One such content is-
"CLLI","SWREL","RPTDATE","RPTIME","TZ","RPTTYPE","RPTPD","IVALDATE","I +VALSTART","IVALEND","NUMENTIDS">
"toroonxn0dw","EAGLE5 40.1.0-62.13.19","2009-11-13","19:00:23","EST ", +"STP SYSTEM TOTAL MEASUREMENTS ON TT","LAST","2009-11-13","18:45:00", +"19:00:00",256
"STATUS","TT","GTTPERFD","GTTUN0NS","GTTUN1NT","AGTTPERFD" "K","0",0,0,0,0 "K","1",0,0,0,0 "K","2",0,0,0,0 "K","3",0,0,0,0 "K","4",0,0,0,0 "K","5",0,0,0,0 "K","6",0,0,0,0 "K","7",0,0,0,0 I have to take the IVALDATE, IVALSTART and IVALEND values from the second line which in this case are 2009-11-13, 18:45:00 and 19:00:00 respectively. And then all the values in the below part. My desired output for the above input is (tab seperated fields)-
Date StartTime EndTime STATUS TT GTTPERFD GTTUN0NS + GTTUN1NT AGTTPERFD
2009-11-13    18:45:00    19:00:00 K    0    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    1    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    2    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    3    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    4    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    5    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    6    0    0    0    0 2009-11-13    18:45:00    19:00:00 K    7    0    0    0    0 The header line must come only once, e.g. from one file only. For all other files it would do all the things except putting the header again. Can anybody help me in this regard with a Perl script?
Thanks, Windperl

Replies are listed 'Best First'.
Re: Need help to parse a csv file
by keszler (Priest) on Nov 17, 2009 at 21:14 UTC
Re: Need help to parse a csv file
by zwon (Abbot) on Nov 17, 2009 at 21:11 UTC

    So what is your problem? It looks like a quite straight task. You can start with Text::CSV.

Re: Need help to parse a csv file
by Tux (Canon) on Nov 18, 2009 at 07:31 UTC

    Given the quest, you probably want DBD::CSV, which uses Text::CSV_XS in the background.

    Read about DBI before proceeding. But be glad you did: it is the most often used interface to data collections in the perl world.


    Enjoy, Have FUN! H.Merijn
Re: Need help to parse a csv file
by VinsWorldcom (Prior) on Nov 17, 2009 at 21:25 UTC

    If all your input files are formatted the same with predictable "cells" (row, col) then you could possible use a script I've written to take input from several Excel spreadsheets and put it into a single one by rows - much like what you're describing. I've adapted it to allow for tab-delimited and CSV input files.

    Have a look at: Parse Excel Spreadsheets To Single.

Re: Need help to parse a csv file
by przemo (Scribe) on Nov 17, 2009 at 21:30 UTC

    Something similar to this would work.

    use Text::CSV; use warnings; use strict; # Reading the first file and extracting IVAL...S my $csv = Text::CSV->new; my @header = $csv->getline(\*DATA); $csv->column_names(@header); my $frst = $csv->getline_hr(\*DATA); my ($ivaldate, $ivalstart, $ivalend) = map { $frst->{$_} } qw( IVALDAT +E IVALSTART IVALEND ); # Reading all the rest my $line = <DATA>; $line =~ s/,/\t/g; print join "\t", qw( Date StartTime EndTime), $line; while ($line = <DATA>) { $line =~ s/,/\t/g; print join "\t", $ivaldate, $ivalstart, $ivalend, $line; } __DATA__ "CLLI","SWREL","RPTDATE","RPTIME","TZ","RPTTYPE","RPTPD","IVALDATE","I +VALSTART","IVALEND","NUMENTIDS" "toroonxn0dw","EAGLE5 40.1.0-62.13.19","2009-11-13","19:00:23","EST ", +"STP SYSTEM TOTAL MEASUREMENTS ON TT","LAST","2009-11-13","18:45:00", +"19:00:00",256 "STATUS","TT","GTTPERFD","GTTUN0NS","GTTUN1NT","AGTTPERFD" "K","0",0,0,0,0 "K","1",0,0,0,0 "K","2",0,0,0,0 "K","3",0,0,0,0 "K","4",0,0,0,0 "K","5",0,0,0,0 "K","6",0,0,0,0 "K","7",0,0,0,0
      Oh no, you should also use Text::CSV for # Reading all the rest.

        That is left as an exercise to the reader. :)