Re^4: problems parsing CSV

I've incorporated Tux's suggestion to use getline (getline_hr, actually) instead of <>/parse/fields. It really tightens up the whole script.

#!/usr/bin/perl

use strict;
use warnings;

use English qw( -no_match_vars );
use Text::CSV;

$OUTPUT_FIELD_SEPARATOR  = "\n";
$OUTPUT_RECORD_SEPARATOR = "\n";

my $release_file = '../ecodata/releases.txt';

# Text is in the ISO 8859-1 (Latin 1) encoding
open my $release_fh, '<:encoding(iso-8859-1)', $release_file
    or die "Can't open release file $release_file: $OS_ERROR\n";

my $csv = Text::CSV->new({
    auto_diag          => 1,
    binary             => 1,
    allow_loose_quotes => 1,
    escape_char        => '\\',
});

# Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams'
$csv->column_names($csv->getline($release_fh));

while (my $value = $csv->getline_hr($release_fh)) {
    {
        no warnings qw( numeric );

        if ($value->{'Pounds'} == 0.0 and $value->{'Grams'} == 0.0) {
            warn "Release $value->{'Release#'} is weightless\n";
        }
    }

    print $value->{'TRI'},
          $value->{'Release#'},
          $value->{'ChemName'},
          $value->{'RegNum'},
          $value->{'Year'},
          $value->{'Pounds'},
          $value->{'Grams'};
}

close $release_fh;

exit 0;
[download]

Comment on Re^4: problems parsing CSV Download Code

Replies are listed 'Best First'.
Re^5: problems parsing CSV by Tux (Canon) on Oct 11, 2010 at 06:40 UTC
The `bind_columns ()` method is actually faster. It matters when your streams are big my $csv = Text::CSV->new ({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => "\\", }); # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' my %value; $csv->bind_columns (\@value{@{$csv->getline ($release_fh)}}); while ($csv->getline_hr ($release_fh)) { { no warnings "numeric"; $value{Pounds} == 0.0 && $value->{Grams} == 0.0 and warn "Release $value->{'Release#'} is weightless\n"; } print $value{"TRI"}, $value{"Release#"}, $value{"ChemName"}, $value{"RegNum"}, $value{"Year"}, $value{"Pounds"}, $value{"Grams"}; } [download] YMMV, bench to check if it also validates for your set of data. My speed comparison looks like this. In that image, the lower the line, the faster, so Text::CSV_XS with `bindcolumns ()` (labeled "xs bndc") is the fastest on all sizes and the pure perl Text::CSV_PP counterpart with `bindcolumns ()` (labeled "pp bndc") is the slowest, as it has the most overhead in pure perl. If you only look at the differences in the XS implementation, look at this graph. Update 1: removed the erroneous call to `column_names ()` as spotted by jim. Update 2: New graphs: XS + PP and XS only Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^6: problems parsing CSV by Jim (Curate) on Oct 12, 2010 at 20:51 UTC
Ok, here's the same script using bind_columns. #!/usr/bin/perl use strict; use warnings; use English qw( -no_match_vars ); use Text::CSV; $OUTPUT_FIELD_SEPARATOR = "\n"; $OUTPUT_RECORD_SEPARATOR = "\n"; my $release_file = '../ecodata/releases.txt'; open my $release_fh, '<', $release_file or die "Can't open release file $release_file: $OS_ERROR\n"; my $csv = Text::CSV->new({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => '\\', }); my %value; # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' my @column_labels = $csv->column_names($csv->getline($release_fh)); $csv->bind_columns(\@value{@column_labels}); while ($csv->getline_hr($release_fh)) { { no warnings 'numeric'; if ($value{'Pounds'} == 0.0 and $value{'Grams'} == 0.0) { warn "Release $value{'Release#'} is weightless\n"; } } print $value{'TRI'}, $value{'Release#'}, $value{'ChemName'}, $value{'RegNum'}, $value{'Year'}, $value{'Pounds'}, $value{'Grams'}; } close $release_fh; exit 0; [download] I had to change your... `\@value{@{$csv->column_names($csv->getline($release_fh))}}` [download] ...to... `\@value{$csv->column_names($csv->getline($release_fh))}` [download]	[reply] [d/l] [select]
Re^7: problems parsing CSV by Tux (Canon) on Oct 13, 2010 at 06:20 UTC
You are now mixing two approaches. When using `bind_columns ()` you should not use `getline_hr ()` but `getline ()`, because you are not returning a hashref but reading into prebound variables: `my @column_labels = @{$csv->getline ($release_fh)}; $csv->bind_columns (\@value{@column_labels}); while ($csv->getline ($release_fh)) { : }` [download] You do not use the method `column_names ()` at all. That was a cut-n-paste error from your code in my previous example. Mea culpa. `\@value{@{$csv->column_names ($csv->getline ($release_fh))}} => \@value{@{$csv->getline ($release_fh)}};` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^8: problems parsing CSV by Jim (Curate) on Oct 13, 2010 at 16:36 UTC