Re: CSV file
by Juerd (Abbot) on Mar 09, 2004 at 16:44 UTC
|
My problem is that one of the notes fields contains some unescaped quotation marks, new line characters, and commas.
Then it is not CSV :)
If the new line characters and commas are in balanced quotation marks, you might be able to process your file with a regex (how large is your file?). The quotes themselves will be a problem unless you only pick quotes not followed by a comma.
It'd help if you posted a few lines that are hard to parse together with a description of how you would like to have them.
| [reply] |
|
|
Then it is not CSV :)
I tried telling them that, but they won't fix how the files are being written.
Here is a generalized problem record:
"data", "moredata", "the notes field "with unescaped quotes" and
lets have a hard return in here
in a few places,
and while we're having fun with that, why not
have some "random, commas too!", "lots more data", "lastfield in the record"
Does that help any? | [reply] |
|
|
Is every field in quotes, then? If so, you might be able to read lines until you match quote-newline at the end of a line, strip the leading and trailing quotes, then split the whole mess on /",\s*"/. Like so (untested):
my $line = '';
my @fields;
while (<>) {
$line .= $_;
if (/"\n/ or eof) {
$line =~ s/^"//;
$line =~ s/"$//;
@fields = split(/",\s*/);
$line = '';
}
}
The PerlMonk tr/// Advocate
| [reply] [d/l] |
|
|
|
|
Do you always know how many fields to expect? If not, it may be impossible to properly interrpret the file ie:
"data", "moredate", "this is a field", " and another coma for good mea
+sure with unescaped quotes followed by a coma"
was the preceding data 1,2,3, or 4 fields? impossible to tell. Even if you know how many fields there should be it's an impossible problem say there should be 3 fields:
"data blah","foooo","hmmmmmmmm","what if there arnt even numbers of qu
+otes"
what are the 3 fields? there are several perfectly legal variants. Impossible.
| [reply] [d/l] [select] |
|
|
|
|
my $regex = join ', ', ('".*?"') x 5;
$regex = qr/$regex\n/s;
while (my @fields = $data =~ /\G$regex/g) {
...
}
(untested as usual)
| [reply] [d/l] |
|
|
Okay, I'm assuming the following, since the specs are a little unclear:
- you can separate one record from another
- the first two fields are consistently quoted
- the third field always begins and ends with a quote
- the last two fields are consistently quoted
- only the third field can contain returns, commas or quotes
#!/usr/bin/perl
use strict;
use warnings;
my $data = do { local $/ = undef; <DATA> };
my @fields = $data =~ /"(.*?)",\s+"(.*?)",\s+"(.*)",\s+"(.*?)",\s+"(.*
+?)"/s;
print "RESULTS:\n";
print ":$_:\n" for @fields;
__DATA__
"data", "moredata", "the notes field "with unescaped quotes" and
lets have a hard return in here
in a few places,
and while we're having fun with that, why not
have some "random, commas too!", "lots more data", "lastfield in the r
+ecord"
The first two and last two captures of the regex are non-greedy, and the middle one sucks everything up. Hope this helps... | [reply] [d/l] |
Re: CSV file
by Abigail-II (Bishop) on Mar 09, 2004 at 16:44 UTC
|
IIRC, MySQL can load data from a CVS file directly just fine.
Abigail | [reply] |
|
|
| [reply] |
Re: CSV file
by diotalevi (Canon) on Mar 09, 2004 at 17:34 UTC
|
| [reply] |
|
|
| [reply] |
Re: CSV file
by kvale (Monsignor) on Mar 09, 2004 at 16:51 UTC
|
In CSV files, quotes are used to protect things like commas that are not delimiters. I'd recommend using a module like
Text::CSV_XS to handle your data:
use Text::CSV_XS;
$csv = Text::CSV_XS->new(); # create a new object
$csv->types(\@t_array); # Set column types
$columns = $csv->getline($io); # Read a line from file $i
+o, parse it
# and return an array ref
+of fields
| [reply] [d/l] |