CSV file

xorl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: CSV file by Juerd (Abbot) on Mar 09, 2004 at 16:44 UTC
My problem is that one of the notes fields contains some unescaped quotation marks, new line characters, and commas. Then it is not CSV :) If the new line characters and commas are in balanced quotation marks, you might be able to process your file with a regex (how large is your file?). The quotes themselves will be a problem unless you only pick quotes not followed by a comma. It'd help if you posted a few lines that are hard to parse together with a description of how you would like to have them. Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply]
Re: Re: CSV file by xorl (Deacon) on Mar 09, 2004 at 16:51 UTC
Then it is not CSV :) I tried telling them that, but they won't fix how the files are being written. Here is a generalized problem record: "data", "moredata", "the notes field "with unescaped quotes" and lets have a hard return in here in a few places, and while we're having fun with that, why not have some "random, commas too!", "lots more data", "lastfield in the record" Does that help any?	[reply]
Re^3: CSV file by Roy Johnson (Monsignor) on Mar 09, 2004 at 17:14 UTC
Is every field in quotes, then? If so, you might be able to read lines until you match quote-newline at the end of a line, strip the leading and trailing quotes, then split the whole mess on /",\s"/. Like so (untested): `my $line = ''; my @fields; while (<>) { $line .= $_; if (/"\n/ or eof) { $line =~ s/^"//; $line =~ s/"$//; @fields = split(/",\s/); $line = ''; } }` [download] The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: Re^3: CSV file by xorl (Deacon) on Mar 09, 2004 at 17:27 UTC
Re: Re: Re: CSV file by mutated (Monk) on Mar 09, 2004 at 17:06 UTC
Do you always know how many fields to expect? If not, it may be impossible to properly interrpret the file ie: `"data", "moredate", "this is a field", " and another coma for good mea +sure with unescaped quotes followed by a coma"` [download] was the preceding data 1,2,3, or 4 fields? impossible to tell. Even if you know how many fields there should be it's an impossible problem say there should be 3 fields: `"data blah","foooo","hmmmmmmmm","what if there arnt even numbers of qu +otes"` [download] what are the 3 fields? there are several perfectly legal variants. Impossible. daN.	[reply] [d/l] [select]
Re: Re: Re: Re: CSV file by xorl (Deacon) on Mar 09, 2004 at 17:22 UTC
Re: Re: Re: CSV file by Juerd (Abbot) on Mar 09, 2004 at 17:30 UTC
If you always have the same amount of fields, you might get away with: `my $regex = join ', ', ('".*?"') x 5; $regex = qr/$regex\n/s; while (my @fields = $data =~ /\G$regex/g) { ... }` [download] (untested as usual) Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l]
Re: Re: Re: CSV file by bmann (Priest) on Mar 09, 2004 at 20:18 UTC
Okay, I'm assuming the following, since the specs are a little unclear: you can separate one record from another the first two fields are consistently quoted the third field always begins and ends with a quote the last two fields are consistently quoted only the third field can contain returns, commas or quotes `#!/usr/bin/perl use strict; use warnings; my $data = do { local $/ = undef; <DATA> }; my @fields = $data =~ /"(.?)",\s+"(.?)",\s+"(.)",\s+"(.?)",\s+"(.* +?)"/s; print "RESULTS:\n"; print ":$_:\n" for @fields; __DATA__ "data", "moredata", "the notes field "with unescaped quotes" and lets have a hard return in here in a few places, and while we're having fun with that, why not have some "random, commas too!", "lots more data", "lastfield in the r +ecord"` [download] The first two and last two captures of the regex are non-greedy, and the middle one sucks everything up. Hope this helps...	[reply] [d/l]
Re: CSV file by Abigail-II (Bishop) on Mar 09, 2004 at 16:44 UTC
IIRC, MySQL can load data from a CVS file directly just fine. Abigail	[reply]
Re: CSV file by arden (Curate) on Mar 09, 2004 at 16:46 UTC
If that doesn't work, see Text::CSV or any of the CSV modules at CPAN or just follow the node Comma Delimited Problems from earlier today. - - arden.	[reply]
Re: CSV file by diotalevi (Canon) on Mar 09, 2004 at 17:34 UTC
Text::xSV handles this better than Text::CSV and Text::CSV_XS. Don't write code to parse the data - have the module do the work for you.	[reply]
Re: Re: CSV file by xorl (Deacon) on Mar 09, 2004 at 20:54 UTC
Text::xSV handles this better than Text::CSV and Text::CSV_XS. Don't write code to parse the data - have the module do the work for you. Yes Text::xSV is by far better. It has proven more useful than the other soltuions. Thanks everyone!	[reply]
Re: CSV file by kvale (Monsignor) on Mar 09, 2004 at 16:51 UTC
In CSV files, quotes are used to protect things like commas that are not delimiters. I'd recommend using a module like Text::CSV_XS to handle your data: `use Text::CSV_XS; $csv = Text::CSV_XS->new(); # create a new object $csv->types(\@t_array); # Set column types $columns = $csv->getline($io); # Read a line from file $i +o, parse it # and return an array ref +of fields` [download] -Mark	[reply] [d/l]