suhailck has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have written a script to extract the 1st and 9th fields from a csv of the following format,

BED001,137CM BUDGET BLUE B/SET,BED001,BASE & MATTRESS,,Ea,0,0,39.47,0. +00 BED0015,183CM BODY CUSHION BASE SET,BED0015,BASE & MATTRESS,,Ea,0,0,"1 +,099.91",0.00

The desired output is,
BED001,39.47 BED0015,1099.91

The code written is
perl -lne 'print join ",",map {s/[",]//g;$_}(split /,(?!(?:[^",]+",))/,$_)[0,8]' sample.txt As you can see i have used regex inside split to perform this task.I just want to know about any better (or easy) approach for similar requirements.

Thanks in Advance

~suhail

Replies are listed 'Best First'.
Re: Extracting fields from CSV
by james2vegas (Chaplain) on Aug 30, 2010 at 08:39 UTC
    Text::CSV for all your Comma-Separated Value Needs.
Re: Extracting fields from CSV
by Khen1950fx (Canon) on Aug 30, 2010 at 09:09 UTC
    Text::CSV::Simple does the trick. I tried getting the 1st and the 9th fields, but I think that you want the 2nd and the 8th. Try this:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper::Concise; use Text::CSV::Simple; my $datafile = '/path/to/text.csv'; my $parser = Text::CSV::Simple->new; $parser->want_fields(2,8); my @data = $parser->read_file($datafile); print Dumper(@data);
Re: Extracting fields from CSV
by Tux (Canon) on Aug 30, 2010 at 14:22 UTC

    Using Text::CSV or the faster Text::CSV_XS (which is automatically used by Text::CSV if installed), you prepare yourself for more complicated CSV files. Your example isn't that hard to parse, but once escapes, binary or embedded newlines come in to play, you are doomed when using regular expressions. FWIW, Text::CSV::Simple uses Text::CSV_XS for parsing too.

    As a bonus you get an even faster parsing with this dedicated module. I've created (using your example) a file with 5000 records, and then ran

    use strict; use warnings; use autodie; use Benchmark qw( cmpthese ); use Text::CSV_XS; my $file = "sample.csv"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); cmpthese (10, { regex => sub { open my $fh, "<", $file; while (<$fh>) { my @s = map {s/[",]//g;$_}(split /,(?!(?:[^",]+",))/,$_)[0 +,8]; } }, csv_xs => sub { open my $fh, "<", $file; while (my $row = $csv->getline ($fh)) { my @s = @{$row}[0,8]; } }, });

    Which, IMHO, is nor only faster, but also easier in maintenance. The result:

    Rate regex csv_xs regex 6.29/s -- -20% csv_xs 7.87/s 25% --

    Enjoy, Have FUN! H.Merijn
Re: Extracting fields from CSV
by aquarium (Curate) on Aug 31, 2010 at 00:49 UTC
    your own solution and the various text::csv module solutions provided do the trick. but if it matters (if the output is going back into a database especiallly,) do some sanity checks that you do have the prerequisite number of fields on input, and that they're of desired format, e.g. field one is a valid product code etc, and field 9 is a fairly valid number. for the second case, adding a zero to field 9 should be mathematically equal to field 9.
    the hardest line to type correctly is: stty erase ^H
Re: Extracting fields from CSV
by suhailck (Friar) on Aug 31, 2010 at 02:40 UTC
    Thanks all for your valuable suggestions and feedbacks.

    Now, here is another requirement, i have the file of format as below,(from a sql query output)

    BED001 |137CM BUDGET BLUE B/SET |BED001 |BASE & MATTRESS | |E +a |0 |0 |39.47 |0.00 BED0015 |183CM BODY CUSHION BASE SET |BED0015 |BASE & MATTRESS + | |Ea |0 |0 |1,099.91 |0.00


    The expected output is

    BED001,137CM BUDGET BLUE B/SET,BED001,BASE & MATTRESS,,Ea,0,0,39.47,0. +00 BED0015,183CM BODY CUSHION BASE SET,BED0015,BASE & MATTRESS,,Ea,0,0,"1 +,099.91",0.00


    The code i tried is

    use strict; use Text::CSV; my $csv = Text::CSV->new( { sep_char => "|", allow_whitespace => 1 + } ); my $infile = "s.txt"; my $outfile = "s.csv"; open( my $fh, "<", $infile ); open( my $out, ">", $outfile ); while ( my $array = $csv->getline($fh) ) { local $" = ","; local $\ = "\n"; #$csv->combine(@{$array}); #my $line=$csv->string(); print $out "@{$array}"; #print $out $line; }
    The actual output:

    BED001,137CM BUDGET BLUE B/SET,BED001,BASE & MATTRESS,,Ea,0,0,39.47,0. +00 BED0015,183CM BODY CUSHION BASE SET,BED0015,BASE & MATTRESS,,Ea,0,0,1, +099.91,0.00

    I need to quote 1,099.91 in the second line of the output csv since the string contains sep_char ",".Is there any method to define output sep_char ?? or How to achieve the expected result??

      If you are able to connect to the database from perl, using DBI, you don't need an intermediate file at all.

      Use Text::CSV_XS (or Text::CSV) directly to generate the output and it will correctly deal with the quoting:

      use strict; use Text::CSV_XS; my $csv_in = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, sep_char => "|", allow_whitespace = +> 1 }); my $csv_out = Text::CSV_XS->new ({ binary => 1, eol => "\n" }); my $infile = "s.txt"; my $outfile = "s.csv"; open my $fh, "<", $infile or die "$infile: $!"; open my $out, ">", $outfile or die "$outfile: $!"; while (my $row = $csv_in->getline ($fh)) { $csv_out->print ($out, $row); }

      update: corrected $fh to $out in while loop.


      Enjoy, Have FUN! H.Merijn
        Thanks. This is what i really wanted :)

        There is a small typo in the above code

        $csv_out->print ($fh, $row);
        Filehandle $out is needed in place of $fh for writing

        $csv_out->print ($out, $row);

        ~suhail
      unless you really need the programmatic interface for the sql export, it's typically much better to use the database export utilities provided by the vendor, e.g. mysqldump and similar. these utilities have a tonne of switches to tweak the output as you require, without re-inventing the wheel. then you can concentrate on any actual program logic that does something more useful than import/export. but that's just my take on the situation.
      the hardest line to type correctly is: stty erase ^H