in reply to Re^4: problems parsing CSV
in thread problems parsing CSV

The bind_columns () method is actually faster. It matters when your streams are big

my $csv = Text::CSV->new ({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => "\\", }); # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' my %value; $csv->bind_columns (\@value{@{$csv->getline ($release_fh)}}); while ($csv->getline_hr ($release_fh)) { { no warnings "numeric"; $value{Pounds} == 0.0 && $value->{Grams} == 0.0 and warn "Release $value->{'Release#'} is weightless\n"; } print $value{"TRI"}, $value{"Release#"}, $value{"ChemName"}, $value{"RegNum"}, $value{"Year"}, $value{"Pounds"}, $value{"Grams"}; }

YMMV, bench to check if it also validates for your set of data. My speed comparison looks like this. In that image, the lower the line, the faster, so Text::CSV_XS with bindcolumns () (labeled "xs bndc") is the fastest on all sizes and the pure perl Text::CSV_PP counterpart with bindcolumns () (labeled "pp bndc") is the slowest, as it has the most overhead in pure perl. If you only look at the differences in the XS implementation, look at this graph.

Update 1: removed the erroneous call to column_names () as spotted by jim.

Update 2: New graphs: XS + PP and XS only


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^6: problems parsing CSV
by Jim (Curate) on Oct 12, 2010 at 20:51 UTC

    Ok, here's the same script using bind_columns.

    #!/usr/bin/perl use strict; use warnings; use English qw( -no_match_vars ); use Text::CSV; $OUTPUT_FIELD_SEPARATOR = "\n"; $OUTPUT_RECORD_SEPARATOR = "\n"; my $release_file = '../ecodata/releases.txt'; open my $release_fh, '<', $release_file or die "Can't open release file $release_file: $OS_ERROR\n"; my $csv = Text::CSV->new({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => '\\', }); my %value; # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' my @column_labels = $csv->column_names($csv->getline($release_fh)); $csv->bind_columns(\@value{@column_labels}); while ($csv->getline_hr($release_fh)) { { no warnings 'numeric'; if ($value{'Pounds'} == 0.0 and $value{'Grams'} == 0.0) { warn "Release $value{'Release#'} is weightless\n"; } } print $value{'TRI'}, $value{'Release#'}, $value{'ChemName'}, $value{'RegNum'}, $value{'Year'}, $value{'Pounds'}, $value{'Grams'}; } close $release_fh; exit 0;

    I had to change your...

    \@value{@{$csv->column_names($csv->getline($release_fh))}}
    ...to...
    \@value{$csv->column_names($csv->getline($release_fh))}

      You are now mixing two approaches. When using bind_columns () you should not use getline_hr () but getline (), because you are not returning a hashref but reading into prebound variables:

      my @column_labels = @{$csv->getline ($release_fh)}; $csv->bind_columns (\@value{@column_labels}); while ($csv->getline ($release_fh)) { : }

      You do not use the method column_names () at all. That was a cut-n-paste error from your code in my previous example. Mea culpa.

      \@value{@{$csv->column_names ($csv->getline ($release_fh))}} => \@value{@{$csv->getline ($release_fh)}};

      Enjoy, Have FUN! H.Merijn

        I thought it seemed awfully busy. This is much more understandable:

        #!/usr/bin/perl use strict; use warnings; use English qw( -no_match_vars ); use Text::CSV; $OUTPUT_FIELD_SEPARATOR = "\n"; $OUTPUT_RECORD_SEPARATOR = "\n"; my $release_file = '../ecodata/releases.txt'; open my $release_fh, '<', $release_file or die "Can't open release file $release_file: $OS_ERROR\n"; my $csv = Text::CSV->new({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => '\\', }); my %value; # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' $csv->bind_columns(\@value{@{$csv->getline($release_fh)}}); while ($csv->getline($release_fh)) { { no warnings 'numeric'; if ($value{'Pounds'} == 0.0 and $value{'Grams'} == 0.0) { warn "Release number $value{'Release#'} is weightless\n"; } } print $value{'TRI'}, $value{'Release#'}, $value{'ChemName'}, $value{'RegNum'}, $value{'Year'}, $value{'Pounds'}, $value{'Grams'}; } close $release_fh; exit 0;

        Thank you, Tux.