For the OP just to emphasize the importance of binary in the case you have a def. of CSV that permits embedded newlines.
For Merijn.
I was going to answer more or less the same to the OT yesterday, but came across a few problems, that made me reinstall the latest versions...
- One problem was that I used IO::Wrap objects for stdin and stdout and they don't work with the pure perl version, I am not sure why. Maybe it would be better to load IO::Handle directly and have something for those who want efficiency. In this thread I wanted to test the pure perl version as installing an XS module could have been problematic for thew OP. I think that keeping in sync both versions is important...
- for some reason search.cpan.org gives the version 0.29 Text::CSV_XS but
perl -MCPAN -e install qw(Text::CSV_XS)' installs 0.30 the right one I believe if I remember your post on p5p or pm.
% steph@apexPDell2 (/home/stephan) %
% cat conv_comma2pipe_xs.px
#!/usr/bin/perl
use strict;
use warnings;
$|++;
#use IO::Handle;
use IO::Wrap;
use Text::CSV_XS;
# use DDS;
# my $in = IO::Wrap::wraphandle(\*STDIN) or die;
# my $out = IO::Wrap::wraphandle(\*STDOUT) or die;
# Dump\($in, $out);
my $csv_in = Text::CSV_XS->new({
binary => 1,
}) or die;
my $csv_out = Text::CSV_XS->new({
binary => 1,
sep_char => q{|},
eol => qq{\n},
}) or die;
while (defined (my $rec = $csv_in->getline(\*STDIN)) ) {
{ my @fields = @$rec;
local $"=q{][}; print {\*STDERR} ".rec [@fields]\n";
}
$csv_out->print(\*STDOUT, $rec);
}
__END__
% steph@apexPDell2 (/home/stephan) %
% cat hi1.csv | perl+ -w conv_comma2pipe_xs.px
.rec [a][b][c]
a|b|c
.rec [a][okay, comma][c]
a|"okay, comma"|c
.rec [a][long
line, indeed][end]
a|"long
line, indeed"|end
% steph@apexPDell2 (/home/stephan) %
% cat hi1.csv
a,b,c
a,"okay, comma",c
a,"long
line, indeed",end
cheers
--stephan p.s I tested on cygwin with perl 5.8.7 and 5.8.8
update: oops forgot the code...
| [reply] [d/l] |
One problem was that I used IO::Wrap objects for stdin and stdout and they don't work with the pure perl version, I am not sure why. Maybe it would be better to load IO::Handle directly and have something for those who want efficiency. In this thread I wanted to test the pure perl version as installing an XS module could have been problematic for thew OP. I think that keeping in sync both versions is important...
The maintainer of Text::CSV_PP is doing a real nice job in trying to keep it in sync with Text::CSV_XS and we do have (a lot) of contact about that. I already had a look at version 1.06, and it passed all tests for 0.30, except the diagnostics tests, which is logical and explainable.
That maintainer also got the maintainership for the very old Text::CSV, which will be a wrapper around Text::CSV_XS and Text::CSV_PP and choose the one available, based on a method used in DBI::PurePerl: the environment variable TEXT_CSV_XS, and will default to the fastest method available.
I have been thinking about the use of IO::Handle, making it either default, or use'd automatically, but everything I came up with so far will imply a slowdown, which is IMHO unacceptable. I't a bit of a shame that this is a relative expensive module to load (14 kb of source code).
for some reason search.cpan.org gives the version 0.29 Text::CSV_XS but perl -MCPAN -e install qw(Text::CSV_XS)' installs 0.30 the right one I believe if I remember your post on p5p or pm.
Maybe I've been working too hard lately on this module, and uploaded too many versions :) Give CPAN some time to sync around the world.
I'll have a look at the IO::Wrap thingy
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |