Re: Text::CSV_XS and line-endings
by Argel (Prior) on Mar 17, 2006 at 00:38 UTC
|
It's been a awhile since I used Text::CSV_XS, but don't you create an IO::File instance and then pass that to yout Text::CSV instance? If so then would the following from IO::Handle do what you want? They are about halfway down the page.
IO::Handle->format_line_break_characters( [STR] ) $:
IO::Handle->input_record_separator( [STR] ) $/
Seems like it should honor $/ but perhaps you can force it like so?
IO::Handle->input_record_separator( ["\r"] );
Update1: Looks like this is what you want. However, Text::CSV_XS seems to somehow ignore line 7. According to the Text::CV_XS documentation the IO::Handle->getline is what is called, so the above should in theory work. However, $csv->getline returns undef on my simulated MAC test file. Looks like the Decode routine in the .so may be the culprit?
#!/usr/local/bin/perl
use strict;
use warnings;
use Data::Dumper;
use IO::File;
use Text::CSV_XS;
IO::Handle->input_record_separator( "\r" );
my $file = defined $ARGV[0] ? $ARGV[0] : 'normal.txt';
my $io = new IO::File "$file", "<" || die "horribly";
my $csv = new Text::CSV_XS;
# my $test = $io->getline;
# print Data::Dumper->Dump([$test],['io']);
my $columns = $csv->getline($io);
print Data::Dumper->Dump([$columns],['csv']);
exit 0;
Also cleaned up some errors in the original portion.
Update2: Would something like the following work?
cat mac.txt | perl -e '$/="\r"; while(<>){$_=~s/\015$/\n/; print $_;}'
| [reply] [d/l] [select] |
Re: Text::CSV_XS and line-endings
by roboticus (Chancellor) on Mar 17, 2006 at 02:23 UTC
|
friedo--
This works for me:
#!/usr/bin/perl -w
use strict;
use Text::CSV_XS;
# Slurp up the whole file
open(INF,"<test.mac") || die "Can't open test.mac!";
my $file = <INF>;
close(INF);
# Convert CRs to LFs
$file =~ s/\015/\012/g;
# Parse CSV file line-by-line
my $csv = Text::CSV_XS->new();
for my $i (split /\012+/, $file) {
my $status = $csv->parse($i);
print "ST:", $status;
for my $j ($csv->fields) {
print " [", $j, "]";
}
print "\n";
}
--roboticus | [reply] [d/l] |
|
|
Since we don't know what OS the client was running, perhaps
# Convert CRs and CRLFs to LFs
$file =~ s/\015\012?/\012/g;
is best. Are LFCRs a possible concern? | [reply] [d/l] |
|
|
Use Text::FixEOL to fix messed up line endings. It does the sane thing for even really messed up line endings in most cases. That is what it was written for.
use Text::FixEOL;
# Convert EOLs in the $file string to unix conventions
my $fixer = Text::FixEOL->new;
$file = $fixer->to_unix($file);
| [reply] [d/l] |
|
|
That won't work for rows that contain new-lines, which is common in CSVs since they don't have a way to escape them. Text::CSV_XS handles that with its binary option, but only if you let it read the lines for obvious reasons.
-sam
| [reply] |
Re: Text::CSV_XS and line-endings
by traveler (Parson) on Mar 16, 2006 at 22:58 UTC
|
If some rows contain newlines, won't mac2unix mess those up? Have you tried contacting the author? | [reply] |
|
|
I'm not sure what you mean by "mess them up". It would convert them to Unix line-endings, but as long as the data is text that shouldn't make a big difference.
-sam
| [reply] |
Re: Text::CSV_XS and line-endings
by GrandFather (Saint) on Mar 16, 2006 at 22:04 UTC
|
If you have the option, rather than using getline ($io) you could pull the lines out yourself and hand them to parse ($line) (use fields () to get a getline equivelent result list).
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
|
|
That won't work with rows containing new-lines, which is very common in CSV data. Text::CSV_XS handles this when given the binary option.
-sam
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
Re: Text::CSV_XS and line-endings
by zer (Deacon) on Mar 16, 2006 at 21:40 UTC
|
| [reply] |
|
|
Aside from the fact that $/ doesn't affect Text::CSV_XS, $ENV{OS} won't help here either. The issue is the OS of the person who created the file, not the system running the code!
-sam
| [reply] |
|
|
Unfortunately changing $/ doesn't work because Text::CSV_XS does its own low-level parsing. The OS the code is running on is known, the problem is the CSV files can come from anywhere. Thanks for the suggestions though.
| [reply] |
|
|
In addition to the problems pointed out by others (need the OS on which a file was created not the OS on the current machine), you shouldn't rely on $ENV{OS}, even if it happens to be in your environment. It's not set automatically by perl, and doesn't seem to be set in Gentoo Linux, RedHat Linux, OS X, or Solaris. You should use $^O (equivalent to $Config{osname}).
perl v5.8.7 i386-linux $Config{osname}=$^O=linux $
+ENV{OS}=
perl v5.8.4 i686-linux-thread-multi $Config{osname}=$^O=linux $
+ENV{OS}=
perl v5.8.7 i686-linux $Config{osname}=$^O=linux $
+ENV{OS}=
perl 5.005_03 sun4-solaris $Config{osname}=$^O=solaris $
+ENV{OS}=
perl v5.8.6 darwin-thread-multi-2level $Config{osname}=$^O=darwin $
+ENV{OS}=
| [reply] [d/l] |
Re: Text::CSV_XS and line-endings
by jZed (Prior) on Mar 17, 2006 at 17:17 UTC
|
Sorry to get into this late, it always seems like the postings I most need to see happen when I am taking a break from PM. (maintainer of Text::CSV_XS here). The module handles MAC line endings fine, just specify eol="\015" either globally or per table. I should probably revise the dcos.
update:I originally mentioned "csv_eol" but that's the syntax for DBD::CSV, not Text::CSV_XS, it is now shown correctly as "eol" | [reply] |
|
|
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV_XS;
use Data::Dumper;
use IO::File;
my $fh = IO::File->new;
$fh->open( "<test.csv" ) or die $!;
my $c = Text::CSV_XS->new( { binary => 1, csv_eol => "\015" } );
my $d = $c->getline( $fh );
print Dumper( $d );
And test.csv contains:
foo,bar,baz^Mred,green,blue^Mnarf,blatz,quux
(Where ^M's are \r's)
That script results in $d being undef, as reported by Data::Dumper. Running the script on the same data with \n's instead of \r's works fine. | [reply] [d/l] [select] |
|
|
Ooops, sorry, I was giving you DBD::CSV instructions, not Text::CSV_XS instructions, use "eol=>" instead of "csv_eol=>".
| [reply] |
|
|
|
|
|
|