Re^5: problems parsing CSV

The bind_columns () method is actually faster. It matters when your streams are big

my $csv = Text::CSV->new ({
    auto_diag          => 1,
    binary             => 1,
    allow_loose_quotes => 1,
    escape_char        => "\\",
    });

# Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams'
my %value;
$csv->bind_columns (\@value{@{$csv->getline ($release_fh)}});
while ($csv->getline_hr ($release_fh)) {
    {   no warnings "numeric";
        $value{Pounds} == 0.0 && $value->{Grams} == 0.0 and
            warn "Release $value->{'Release#'} is weightless\n";
        }

    print $value{"TRI"},
          $value{"Release#"},
          $value{"ChemName"},
          $value{"RegNum"},
          $value{"Year"},
          $value{"Pounds"},
          $value{"Grams"};
    }
[download]

YMMV, bench to check if it also validates for your set of data. My speed comparison looks like this. In that image, the lower the line, the faster, so Text::CSV_XS with bindcolumns () (labeled "xs bndc") is the fastest on all sizes and the pure perl Text::CSV_PP counterpart with bindcolumns () (labeled "pp bndc") is the slowest, as it has the most overhead in pure perl. If you only look at the differences in the XS implementation, look at this graph.

Update 1: removed the erroneous call to column_names () as spotted by jim.

Update 2: New graphs: XS + PP and XS only

Enjoy, Have FUN! H.Merijn

Comment on Re^5: problems parsing CSV Select or Download Code

Replies are listed 'Best First'.
Re^6: problems parsing CSV by Jim (Curate) on Oct 12, 2010 at 20:51 UTC
Ok, here's the same script using bind_columns. #!/usr/bin/perl use strict; use warnings; use English qw( -no_match_vars ); use Text::CSV; $OUTPUT_FIELD_SEPARATOR = "\n"; $OUTPUT_RECORD_SEPARATOR = "\n"; my $release_file = '../ecodata/releases.txt'; open my $release_fh, '<', $release_file or die "Can't open release file $release_file: $OS_ERROR\n"; my $csv = Text::CSV->new({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => '\\', }); my %value; # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' my @column_labels = $csv->column_names($csv->getline($release_fh)); $csv->bind_columns(\@value{@column_labels}); while ($csv->getline_hr($release_fh)) { { no warnings 'numeric'; if ($value{'Pounds'} == 0.0 and $value{'Grams'} == 0.0) { warn "Release $value{'Release#'} is weightless\n"; } } print $value{'TRI'}, $value{'Release#'}, $value{'ChemName'}, $value{'RegNum'}, $value{'Year'}, $value{'Pounds'}, $value{'Grams'}; } close $release_fh; exit 0; [download] I had to change your... `\@value{@{$csv->column_names($csv->getline($release_fh))}}` [download] ...to... `\@value{$csv->column_names($csv->getline($release_fh))}` [download]	[reply] [d/l] [select]
Re^7: problems parsing CSV by Tux (Canon) on Oct 13, 2010 at 06:20 UTC
You are now mixing two approaches. When using `bind_columns ()` you should not use `getline_hr ()` but `getline ()`, because you are not returning a hashref but reading into prebound variables: `my @column_labels = @{$csv->getline ($release_fh)}; $csv->bind_columns (\@value{@column_labels}); while ($csv->getline ($release_fh)) { : }` [download] You do not use the method `column_names ()` at all. That was a cut-n-paste error from your code in my previous example. Mea culpa. `\@value{@{$csv->column_names ($csv->getline ($release_fh))}} => \@value{@{$csv->getline ($release_fh)}};` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^8: problems parsing CSV by Jim (Curate) on Oct 13, 2010 at 16:36 UTC
I thought it seemed awfully busy. This is much more understandable: #!/usr/bin/perl use strict; use warnings; use English qw( -no_match_vars ); use Text::CSV; $OUTPUT_FIELD_SEPARATOR = "\n"; $OUTPUT_RECORD_SEPARATOR = "\n"; my $release_file = '../ecodata/releases.txt'; open my $release_fh, '<', $release_file or die "Can't open release file $release_file: $OS_ERROR\n"; my $csv = Text::CSV->new({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => '\\', }); my %value; # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' $csv->bind_columns(\@value{@{$csv->getline($release_fh)}}); while ($csv->getline($release_fh)) { { no warnings 'numeric'; if ($value{'Pounds'} == 0.0 and $value{'Grams'} == 0.0) { warn "Release number $value{'Release#'} is weightless\n"; } } print $value{'TRI'}, $value{'Release#'}, $value{'ChemName'}, $value{'RegNum'}, $value{'Year'}, $value{'Pounds'}, $value{'Grams'}; } close $release_fh; exit 0; [download] Thank you, Tux.	[reply] [d/l]