in reply to CSV and regex mixups

CSV says that embedded quotes should be doubled in the CSV field. To see Text::CSV's notion of that,

$ perl -MText::CSV -e'$c = Text::CSV->new(); $c->combine qw/Crosby Stills Nash/, q/and sometimes "Young"/; print $c->string, $/' "Crosby","Stills","Nash","and sometimes ""Young""" $
It sounds as if your application is not producing valid CSV to that standard. See the CAVEATS section of the Text::CSV perldoc for the CSV convention the module is written to.

Check Anydata::Format::CSV if you cannot get your data in Text::CSV's preferred format. It allows you to construct a parser with your choices for 'field_sep', 'quote', 'escape', and 'record_sep'. That may not fix all your problems if the app has a plain inadequate notion of CSV, but it might work.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re: Re: CSV and regex mixups
by Tomte (Priest) on Jul 02, 2003 at 23:43 UTC

    Building on what Zaxo said about correct CSV quoting of double-quotes, you might as well filter the CSV-file before you process it with Text::CSV.
    the following regex, using zero-width look-ahead/-behind assertions, works quite well (with one flaw):

    (?<![,])"(?![,])
    A little test:
    #!/usr/bin/env perl my $test = '""crosby"","stills","nash","and sometimes "young""'; $test =~ s/(?<![,])"(?![,])/""/g; print $test, "\n"; $test = '"som"ething","sil"ly","quo"ted"'; $test =~ s/(?<![,])"(?![,])/""/g; print $test, "\n"; __END__ """"crosby""","stills","nash","and sometimes ""young"""" ""som""ething","sil""ly","quo""ted""
    The flaw, as you might've already seen, is that the look-ahead/behind assertions recognize the start and end of the line as 'not comma', and therefore substitute the leading and the tailing double-quote too. If your data is otherwise well-formed, all you have to do, is add a second filter:
    $test =~ s/^"(.*)"$/$1/g;

    regards,
    tomte


    Hlade's Law:

    If you have a difficult task, give it to a lazy person --
    they will find an easier way to do it.