Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^5: Pre-process csv files before using

by sk (Curate)
on Aug 06, 2005 at 20:34 UTC ( #481548=note: print w/replies, xml ) Need Help??


in reply to Re^4: Pre-process csv files before using
in thread Pre-process csv files before using

I am not going to do Re:Re:Re now as it makes it hard to read. This reply is for this node Re^8: Pre-process csv files before using

Could you please wrap your column header info inside code tags? Long lines don't wrap otherwise. Thanks!

#!/usr/bin/perl use strict; use warnings; my $str = <DATA>; my @origcols = split /,/,$str; my @cols = (); foreach (@origcols) { $_ =~ s/\.//g; push ( @cols , $_ ) unless /Bandwidth.*|MSTCPLoop.*/ ; } map {print $_,$/ } @cols; __DATA__ PDHCSV40EasternDaylightTime.240,ERWW.COMMUNITIES.MemoryPagesPER.sec,ER +WWCOMMUNITIESNetwor kInterfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetwork +InterfaceEthernetAd apterCurrentBandwidth

Output

PDHCSV40EasternDaylightTime240 ERWWCOMMUNITIESMemoryPagesPERsec ERWWCOMMUNITIESNetworkInterfaceEthernetAdapterModuleBytesTotalPERsec

as you could see the periods are gone and no warnings. Also the column with bandwidth is not listd. Can you make sure your origcols array is correctly populated?

-SK

Replies are listed 'Best First'.
Re^6: Pre-process csv files before using
by DrAxeman (Scribe) on Aug 06, 2005 at 21:46 UTC
    I put the header data in a code tag, but it didn't get wrapped. Corion and I spoke about it.

    @origcols is the problem. I am getting that from reading my csv file. I can't hardcode it since the column headers are not constants. I tried to do the following:
    #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; } print ("@cols \n");

    It errors out with:
    Use of uninitialized value in join or string at ./runsql3.sql line 19.
    Use of uninitialized value in join or string at ./runsql3.sql line 23.

    It also prints out the column headers except the one with the period in it. My guess is that there is an error happening with the SQL SELECT statement. I don't think it likes the period.

    Just to clear things up, here is all my code:
    #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; }; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols;
    Everything works except when there is a . in one of the headers. Then it fails.

    Data in the columns that I am averaging can contain decimals, otherwise I'd just strip out all periods before starting.
    I'm trying to add the slpit statement, but am not sure where/how to use it.
      I have not used DBI::CSV to comment on issues with "." in names. It seems logical that it will break coz of dots as . is typically not a valid name charcater.

      Anyways you need to do an inplace edit of your CSV file before you pass it on to DBI::CSV. The code below demonstrates how to do an inplace edit of your file. This will create a backup of the file (mydata.csv.bak) but you can set the $^I flag to "" to pure inplace effect.

      WARNING: In Place edits are dangerous. Test out the code on a sample file before you run it on the original file

      { local $^I = ".bak"; local *ARGV; @ARGV = "mydata.csv"; while (<>) { s/\.//g if ($. == 1); print; } }
      Input
      PDHCSV40EasternDaylightTime.240,ERWWCOMMUNITIESMemory.PagesPERsec,ERWW +COMMUNITIESNetworkI nterfaceEthernetAdapterModuleBytesTotalPERsec,ERWW.COMMUNITIESNetworkI +nterface.EthernetAd apterCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13

      Note the header - I have threw in some dots there

      After I run the program my mydata.csv becomes this -

      PDHCSV40EasternDaylightTime240,ERWWCOMMUNITIESMemoryPagesPERsec,ERWWCO +MMUNITIESNetworkInt erfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetworkInte +rfaceEthernetAdapte rCurrentBandwidth 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13 1.1,6.6,8,13

      The dots in the header are gone but the ones in the data still remain. Now your CSV file is ready for DBI.

      Hope this helps!

      cheers

      SK

        I am working on test data. I can destroy this data as many times as I like! :-)

        As for the code, it's perfect! Thank you very much!
        Just one quick question. How about if I wanted to keep the original file intact, and have this create a new file with the modified data (over-writting an existing file if it exists)?

        Say it reads, data.csv and writes a new file named <origional-file-name>-ready.csv (or something like that). This way, if there is an error, I can re-run the perl script with out having to re-copy the data.
        --Scratch that--
        I figured out that there is a rename() function.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://481548]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2023-11-30 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?