in reply to Re^4: problems parsing CSV
in thread problems parsing CSV
The bind_columns () method is actually faster. It matters when your streams are big
my $csv = Text::CSV->new ({ auto_diag => 1, binary => 1, allow_loose_quotes => 1, escape_char => "\\", }); # Header is 'TRI,Release#,ChemName,RegNum,Year,Pounds,Grams' my %value; $csv->bind_columns (\@value{@{$csv->getline ($release_fh)}}); while ($csv->getline_hr ($release_fh)) { { no warnings "numeric"; $value{Pounds} == 0.0 && $value->{Grams} == 0.0 and warn "Release $value->{'Release#'} is weightless\n"; } print $value{"TRI"}, $value{"Release#"}, $value{"ChemName"}, $value{"RegNum"}, $value{"Year"}, $value{"Pounds"}, $value{"Grams"}; }
YMMV, bench to check if it also validates for your set of data. My speed comparison looks like this. In that image, the lower the line, the faster, so Text::CSV_XS with bindcolumns () (labeled "xs bndc") is the fastest on all sizes and the pure perl Text::CSV_PP counterpart with bindcolumns () (labeled "pp bndc") is the slowest, as it has the most overhead in pure perl. If you only look at the differences in the XS implementation, look at this graph.
Update 1: removed the erroneous call to column_names () as spotted by jim.
Update 2: New graphs: XS + PP and XS only
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: problems parsing CSV
by Jim (Curate) on Oct 12, 2010 at 20:51 UTC | |
by Tux (Canon) on Oct 13, 2010 at 06:20 UTC | |
by Jim (Curate) on Oct 13, 2010 at 16:36 UTC |