Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a CSV file I need to parse, but the last seperated value sometimes contains multiple lines. The field contains steps to resolve a problem, and they are bulleted. What should I do? Thanks Jim

Replies are listed 'Best First'.
Re: Parsing CSV file ?s
by vladb (Vicar) on May 17, 2002 at 03:05 UTC
    It would help quite a bit if you could include sample CSV data or elaborate a little more on what is it that you are exactly having to deal with. It is pretty hard for me to be able to provide you with a more constructive comment without sample data ;/.

    However, having said that, let me try by pointing you to a few Perl modules that were designed to help programmers like yourself deal with CSV data. Here are a few good ones:

    • Text::CSV - simple manipulation of CSV files.
    • Data::Table

      This module allows you to manipulate CSV files via it's Data::Table::fromCSV class method. Give this one a thorough try and see if it helps you in any way (I hope it will ;-)



    Sorry, Anonymous Monk, but that's about everything I could help you with... Reply to this post if you need any further help, though ;)

    _____________________
    $"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/;$_=&#91"ps -e -o pid | "," $2 | "," -v "," "]`@$_`?{print" ++ $1"}:{print"- $1"}&&`rm $1`;print"\n";}
Re: Parsing CSV file ?s
by greenFox (Vicar) on May 17, 2002 at 03:08 UTC
    I believe Text::xSV by our very own tilly will do what you want.

    --
    my $chainsaw = 'Perl';

      I haven't seen tilly's node before, but i've had some success using CPANs Text::CSV_XS on fields with embedding newlines by with the binary=>1 option...
      my $csv = new Text::CSV_XS({'binary' => 1}); my $fh = wraphandle(\*STDIN); my $columns = 1; while ($columns = $csv->getline($fh) and defined $columns and scalar(@$columns)) { # do stuff with @$columns }
Re: Parsing CSV file ?s
by krujos (Curate) on May 17, 2002 at 08:32 UTC
    If you have multiple lines in a field (between the commas), the data needs to be in quotes so you can match it with a regep if you want to manipulate just it and leave everything else alone. Or if you need to mess with all of the fields you can split on commas and get an array to work with.
    Its hard to answer your question, since you never really say what you want do with the last line.
    This will get you all of the values into an array
    @fields = split /,/$your_line_here;

    If you want to mess with the last field more you can match say $fields[@fields-1]
    Or if you want to just mess with the last field and nothing else try this...
    $foo =~ /,\"(.+)\"/; $bar = $1; ...do some stuff.... $foo =~ s/,\".+\"/,\"$bar\"/;

    The regexp could use work... but thats your job.
    ps, dont you loose the bullets when you take the file out of .xls (I am assuming the file was created from excel, but I could be wrong :) )?
Re: Parsing CSV file ?s
by dsheroh (Monsignor) on May 17, 2002 at 14:46 UTC
    while (<FILE>) { if (/,/) { process_data(@data); @data = split /,/, $_; } else { $data[$#data] .= $_; } }
    Not tested, but the basic structure of 'read a line, split it if there are commas, append it to the last field if there aren't' should be applicable in some form.
Re: Parsing CSV file ?s
by mrbbking (Hermit) on May 17, 2002 at 15:07 UTC
    This seems to be a sample from the file in question, pulled from a duplicate node that is under consideration.
    "7375","Upgrade","2.2 arm procmail_3.15.2-1_arm.deb","Debian","http:// +security.debian.org/dists/stable/updates/main/binary-arm/procmail_3.1 +5.2-1_arm.deb","" "7376","Upgrade","2.2 i386 procmail_3.15.2-1_i386.deb","Debian","http: +//security.debian.org/dists/stable/updates/main/binary-i386/procmail_ +3.15.2-1_i386.deb","" "7377","Upgrade","2.2 m68k procmail_3.15.2-1_m68k.deb","Debian","http: +//security.debian.org/dists/stable/updates/main/binary-m68k/procmail_ +3.15.2-1_m68k.deb","" "7378","Upgrade","2.2 ppc procmail_3.15.2-1_powerpc.deb","Debian","htt +p://security.debian.org/dists/stable/updates/main/binary-powerpc/proc +mail_3.15.2-1_powerpc.deb","" "7379","Hotfix","Q307454","Microsoft","http://download.microsoft.com/d +ownload/winntsp/Patch/q307454/NT4/EN-US/Q307454i.exe","" "7380","APAR","2.2 sparc procmail_3.15.2-1_sparc.deb","Debian","http:/ +/security.debian.org/dists/stable/updates/main/binary-sparc/procmail_ +3.15.2-1_sparc.deb","" "7381","Upgrade","2.2 alpha gftp_2.0.6a-3.2_alpha.deb","Debian","http: +//security.debian.org/dists/stable/updates/main/binary-alpha/gftp_2.0 +.6a-3.2_alpha.deb","" "7382","Patch","Metalink Patches for Oracle9iAS Web Cache","Oracle","h +ttp://metalink.oracle.com","NT/WIN2K: "\n"Patch number 2044682 "\n" "\n"SUN Sparc Solaris: "\n"Patch number 2042106 "\n" "\n"HP-UX: "\n"Patch number 2043908 "\n" "\n"Linux: "\n"Patch number 2043924 "\n" "\n"Compaq Tru64 Unix: "\n"Patch number 2043921 "\n" "\n"IBM AIX: "\n"Patch number 2043917 "\n"" "7383","Upgrade","2.2 arm gftp_2.0.6a-3.2_arm.deb","Debian","http://se +curity.debian.org/dists/stable/updates/main/binary-arm/","" "7384","Upgrade","2.2 i386 gftp_2.0.6a-3.2_i386.deb","Debian","http:// +security.debian.org/dists/stable/updates/main/binary-i386/gftp_2.0.6a +-3.2_i386.deb","" "7385","Upgrade","2.2 m68k gftp_2.0.6a-3.2_m68k.deb","Debian","http:// +security.debian.org/dists/stable/updates/main/binary-m68k/gftp_2.0.6a +-3.2_m68k.deb","" "7386","Upgrade","2.2 ppc gftp_2.0.6a-3.2_powerpc.deb","Debian","http: +//security.debian.org/dists/stable/updates/main/binary-powerpc/gftp_2 +.0.6a-3.2_powerpc.deb","" "7387","Upgrade","2.2 sparc gftp_2.0.6a-3.2_sparc.deb","Debian","http: +//security.debian.org/dists/stable/updates/main/binary-sparc/gftp_2.0 +.6a-3.2_sparc.deb","" "7388","Upgrade","2.2.19-deep-symlink.patch","Rafal Wojtczuk <nergal@7 +bulls.com>","http://www.securityfocus.com/data/vulnerabilities/patche +s/2.2.19-deep-symlink.patch",""