Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: Pre-process csv files before using

by sk (Curate)
on Aug 06, 2005 at 18:51 UTC ( #481532=note: print w/replies, xml ) Need Help??


in reply to Re^2: Pre-process csv files before using
in thread Pre-process csv files before using

Would something like this do?

perl -e'@cols =qw (need1 dont1 need2 dont2 need3); for (@cols) { push( +@req,$_) unless /dont.*/;} print join (",",@req),$/;' __END__ need1,need2,need3

You mentiioned regex so the above should help you. However if the names you want to choose are sent as input then you might want to construct a hash from the list and then drop the ones that should be excluded by checking the hash

.

Replies are listed 'Best First'.
Re^4: Pre-process csv files before using
by DrAxeman (Scribe) on Aug 06, 2005 at 18:58 UTC
    The problem is that I don't nessessarily know what all the column names are. I know that any column that ends with "Bandwidth" or has "MSTCPLoopback" in it I don't want.

    I'm trying to approach this from:
    shift @cols; #Remove the first column for (@cols) { delete $cols[$regex here] }
    Then start my sql stuff.
      You can just grep out the ones you want.
      my @cols = ...; shift @cols; # blindy throw away first column @cols = grep( $_ !~ /(Bandwidth|MSTCPLoopback)$/ , @cols); # exclude +ones ending with "Bandwidth" or "MSTCPLoopback"
      re: your code, note that delete is for hashes .. if you wanted to modify your code, you would use the slice function, but grep is much more powerful and perl-ish.
      I am not going to do Re:Re:Re now as it makes it hard to read. This reply is for this node Re^8: Pre-process csv files before using

      Could you please wrap your column header info inside code tags? Long lines don't wrap otherwise. Thanks!

      #!/usr/bin/perl use strict; use warnings; my $str = <DATA>; my @origcols = split /,/,$str; my @cols = (); foreach (@origcols) { $_ =~ s/\.//g; push ( @cols , $_ ) unless /Bandwidth.*|MSTCPLoop.*/ ; } map {print $_,$/ } @cols; __DATA__ PDHCSV40EasternDaylightTime.240,ERWW.COMMUNITIES.MemoryPagesPER.sec,ER +WWCOMMUNITIESNetwor kInterfaceEthernetAdapterModuleBytesTotalPERsec,ERWWCOMMUNITIESNetwork +InterfaceEthernetAd apterCurrentBandwidth

      Output

      PDHCSV40EasternDaylightTime240 ERWWCOMMUNITIESMemoryPagesPERsec ERWWCOMMUNITIESNetworkInterfaceEthernetAdapterModuleBytesTotalPERsec

      as you could see the periods are gone and no warnings. Also the column with bandwidth is not listd. Can you make sure your origcols array is correctly populated?

      -SK

        I put the header data in a code tag, but it didn't get wrapped. Corion and I spoke about it.

        @origcols is the problem. I am getting that from reading my csv file. I can't hardcode it since the column headers are not constants. I tried to do the following:
        #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; } print ("@cols \n");

        It errors out with:
        Use of uninitialized value in join or string at ./runsql3.sql line 19.
        Use of uninitialized value in join or string at ./runsql3.sql line 23.

        It also prints out the column headers except the one with the period in it. My guess is that there is an error happening with the SQL SELECT statement. I don't think it likes the period.

        Just to clear things up, here is all my code:
        #!/usr/bin/perl use strict; use warnings; use DBI; #use Text::CSV; # Connect to the database, (the directory containing our csv file( +s)) my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;"); # Associate our csv file with the table name 'results' $dbh->{'csv_tables'}->{'results'} = { 'file' => 'test.csv'}; my $sth = $dbh->prepare("SELECT * FROM results WHERE 1=0"); $sth->execute; my @origcols = @{$sth->{NAME}}; my @cols; # = @origcols; shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//g ; push ( @cols , $_ ); # unless /Bandwidth.*|MSTCPLoop.*/ ; }; my $avgSQL = 'SELECT ' . join(', ', map { "avg($_) \n" } @cols ) . ' FROM results'; my @avgs = $dbh->selectrow_array($avgSQL); my %avgsHash; @avgsHash{ @cols } = @avgs; print "Col,Avg\n"; printf("%s,%f\n", $_, $avgsHash{$_}) for @cols;
        Everything works except when there is a . in one of the headers. Then it fails.

        Data in the columns that I am averaging can contain decimals, otherwise I'd just strip out all periods before starting.
        I'm trying to add the slpit statement, but am not sure where/how to use it.
      The code I gave you above should do what you are looking for.

      perl -e'@cols =qw (field1 fieldband field3 fieldMSTCP okfield5); for ( +@cols) { push(@req,$_) unless /band.*|MSTCP.*/;} print join (",",@req +),$/;' __END__ field1,field3,okfield5

      Here i am excluding names with band and MSTCP field names. Since I don't have any field names I am just hard coding it here

      . -SK

        I'm getting an odd error with this
        shift @origcols; foreach ( @origcols ) { $_ =~ s/\.//; push ( @cols , $_ ) unless /Bandwidth.*|MSTCPLoop.*/ ; }

        It says:
        Use of uninitialized value in substitution (s///) at ./runsql3.sql lin +e 20. Use of uninitialized value in pattern match (m//) at ./runsql3.sql lin +e 21.
        Line 20 is my "s/\.//" line

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://481532]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2023-12-02 05:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (13 votes). Check out past polls.

    Notices?