Re: CSV file

Replies are listed 'Best First'.
Re: Re: CSV file by xorl (Deacon) on Mar 09, 2004 at 16:51 UTC
Then it is not CSV :) I tried telling them that, but they won't fix how the files are being written. Here is a generalized problem record: "data", "moredata", "the notes field "with unescaped quotes" and lets have a hard return in here in a few places, and while we're having fun with that, why not have some "random, commas too!", "lots more data", "lastfield in the record" Does that help any?	[reply]
Re^3: CSV file by Roy Johnson (Monsignor) on Mar 09, 2004 at 17:14 UTC
Is every field in quotes, then? If so, you might be able to read lines until you match quote-newline at the end of a line, strip the leading and trailing quotes, then split the whole mess on /",\s"/. Like so (untested): `my $line = ''; my @fields; while (<>) { $line .= $_; if (/"\n/ or eof) { $line =~ s/^"//; $line =~ s/"$//; @fields = split(/",\s/); $line = ''; } }` [download] The PerlMonk `tr///` Advocate	[reply] [d/l]
Re: Re^3: CSV file by xorl (Deacon) on Mar 09, 2004 at 17:27 UTC
This looks like it just might work. I will test it and work along these lines. Every field is should be enclosed in quotes and in briefly looking at the files I don't see where splitting on ",\s would be a problem. Thanks!	[reply]
Re: Re: Re: CSV file by mutated (Monk) on Mar 09, 2004 at 17:06 UTC
Do you always know how many fields to expect? If not, it may be impossible to properly interrpret the file ie: `"data", "moredate", "this is a field", " and another coma for good mea +sure with unescaped quotes followed by a coma"` [download] was the preceding data 1,2,3, or 4 fields? impossible to tell. Even if you know how many fields there should be it's an impossible problem say there should be 3 fields: `"data blah","foooo","hmmmmmmmm","what if there arnt even numbers of qu +otes"` [download] what are the 3 fields? there are several perfectly legal variants. Impossible. daN.	[reply] [d/l] [select]
Re: Re: Re: Re: CSV file by xorl (Deacon) on Mar 09, 2004 at 17:22 UTC
The example was one record with 5 fields. The 3rd field is this annoying notes field that has all of the problems. The first field is supposed to be the tracking number (actually it's alpha-numeric). The other fields are like first and last name, status of the inquiry, name of the judge. With the exception of the notes field (i.e. field3) everything should be limited to no more 50 characters and should just have letters except the id field which is alpha-numeric. Of course the notes field contains the other stuff which I described (and which causes all of the problems).	[reply]
Re: Re: Re: CSV file by Juerd (Abbot) on Mar 09, 2004 at 17:30 UTC
If you always have the same amount of fields, you might get away with: `my $regex = join ', ', ('".*?"') x 5; $regex = qr/$regex\n/s; while (my @fields = $data =~ /\G$regex/g) { ... }` [download] (untested as usual) Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l]
Re: Re: Re: CSV file by bmann (Priest) on Mar 09, 2004 at 20:18 UTC
Okay, I'm assuming the following, since the specs are a little unclear: you can separate one record from another the first two fields are consistently quoted the third field always begins and ends with a quote the last two fields are consistently quoted only the third field can contain returns, commas or quotes `#!/usr/bin/perl use strict; use warnings; my $data = do { local $/ = undef; <DATA> }; my @fields = $data =~ /"(.?)",\s+"(.?)",\s+"(.)",\s+"(.?)",\s+"(.* +?)"/s; print "RESULTS:\n"; print ":$_:\n" for @fields; __DATA__ "data", "moredata", "the notes field "with unescaped quotes" and lets have a hard return in here in a few places, and while we're having fun with that, why not have some "random, commas too!", "lots more data", "lastfield in the r +ecord"` [download] The first two and last two captures of the regex are non-greedy, and the middle one sucks everything up. Hope this helps...	[reply] [d/l]