Extracting fields from CSV

suhailck has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extracting fields from CSV by james2vegas (Chaplain) on Aug 30, 2010 at 08:39 UTC
Text::CSV for all your Comma-Separated Value Needs.	[reply]
Re: Extracting fields from CSV by Khen1950fx (Canon) on Aug 30, 2010 at 09:09 UTC
Text::CSV::Simple does the trick. I tried getting the 1st and the 9th fields, but I think that you want the 2nd and the 8th. Try this: `#!/usr/bin/perl use strict; use warnings; use Data::Dumper::Concise; use Text::CSV::Simple; my $datafile = '/path/to/text.csv'; my $parser = Text::CSV::Simple->new; $parser->want_fields(2,8); my @data = $parser->read_file($datafile); print Dumper(@data);` [download]	[reply] [d/l]
Re: Extracting fields from CSV by Tux (Canon) on Aug 30, 2010 at 14:22 UTC
Using Text::CSV or the faster Text::CSV_XS (which is automatically used by Text::CSV if installed), you prepare yourself for more complicated CSV files. Your example isn't that hard to parse, but once escapes, binary or embedded newlines come in to play, you are doomed when using regular expressions. FWIW, Text::CSV::Simple uses Text::CSV_XS for parsing too. As a bonus you get an even faster parsing with this dedicated module. I've created (using your example) a file with 5000 records, and then ran `use strict; use warnings; use autodie; use Benchmark qw( cmpthese ); use Text::CSV_XS; my $file = "sample.csv"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); cmpthese (10, { regex => sub { open my $fh, "<", $file; while (<$fh>) { my @s = map {s/[",]//g;$_}(split /,(?!(?:[^",]+",))/,$_)[0 +,8]; } }, csv_xs => sub { open my $fh, "<", $file; while (my $row = $csv->getline ($fh)) { my @s = @{$row}[0,8]; } }, });` [download] Which, IMHO, is nor only faster, but also easier in maintenance. The result: `Rate regex csv_xs regex 6.29/s -- -20% csv_xs 7.87/s 25% --` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: Extracting fields from CSV by aquarium (Curate) on Aug 31, 2010 at 00:49 UTC
your own solution and the various text::csv module solutions provided do the trick. but if it matters (if the output is going back into a database especiallly,) do some sanity checks that you do have the prerequisite number of fields on input, and that they're of desired format, e.g. field one is a valid product code etc, and field 9 is a fairly valid number. for the second case, adding a zero to field 9 should be mathematically equal to field 9. the hardest line to type correctly is: stty erase ^H	[reply]
Re: Extracting fields from CSV by suhailck (Friar) on Aug 31, 2010 at 02:40 UTC
Thanks all for your valuable suggestions and feedbacks. Now, here is another requirement, i have the file of format as below,(from a sql query output) `BED001 \|137CM BUDGET BLUE B/SET \|BED001 \|BASE & MATTRESS \| \|E +a \|0 \|0 \|39.47 \|0.00 BED0015 \|183CM BODY CUSHION BASE SET \|BED0015 \|BASE & MATTRESS + \| \|Ea \|0 \|0 \|1,099.91 \|0.00` [download] The expected output is `BED001,137CM BUDGET BLUE B/SET,BED001,BASE & MATTRESS,,Ea,0,0,39.47,0. +00 BED0015,183CM BODY CUSHION BASE SET,BED0015,BASE & MATTRESS,,Ea,0,0,"1 +,099.91",0.00` [download] The code i tried is `use strict; use Text::CSV; my $csv = Text::CSV->new( { sep_char => "\|", allow_whitespace => 1 + } ); my $infile = "s.txt"; my $outfile = "s.csv"; open( my $fh, "<", $infile ); open( my $out, ">", $outfile ); while ( my $array = $csv->getline($fh) ) { local $" = ","; local $\ = "\n"; #$csv->combine(@{$array}); #my $line=$csv->string(); print $out "@{$array}"; #print $out $line; }` [download] The actual output: `BED001,137CM BUDGET BLUE B/SET,BED001,BASE & MATTRESS,,Ea,0,0,39.47,0. +00 BED0015,183CM BODY CUSHION BASE SET,BED0015,BASE & MATTRESS,,Ea,0,0,1, +099.91,0.00` [download] I need to quote 1,099.91 in the second line of the output csv since the string contains sep_char ",".Is there any method to define output sep_char ?? or How to achieve the expected result??	[reply] [d/l] [select]
Re^2: Extracting fields from CSV by Tux (Canon) on Aug 31, 2010 at 06:36 UTC
If you are able to connect to the database from perl, using DBI, you don't need an intermediate file at all. Use Text::CSV_XS (or Text::CSV) directly to generate the output and it will correctly deal with the quoting: `use strict; use Text::CSV_XS; my $csv_in = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, sep_char => "\|", allow_whitespace = +> 1 }); my $csv_out = Text::CSV_XS->new ({ binary => 1, eol => "\n" }); my $infile = "s.txt"; my $outfile = "s.csv"; open my $fh, "<", $infile or die "$infile: $!"; open my $out, ">", $outfile or die "$outfile: $!"; while (my $row = $csv_in->getline ($fh)) { $csv_out->print ($out, $row); }` [download] update: corrected `$fh` to `$out` in while loop. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^3: Extracting fields from CSV by suhailck (Friar) on Aug 31, 2010 at 07:05 UTC
Thanks. This is what i really wanted :) There is a small typo in the above code `$csv_out->print ($fh, $row);` Filehandle $out is needed in place of $fh for writing `$csv_out->print ($out, $row);` ~suhail	[reply] [d/l] [select]
Re^2: Extracting fields from CSV by aquarium (Curate) on Aug 31, 2010 at 04:55 UTC
unless you really need the programmatic interface for the sql export, it's typically much better to use the database export utilities provided by the vendor, e.g. mysqldump and similar. these utilities have a tonne of switches to tweak the output as you require, without re-inventing the wheel. then you can concentrate on any actual program logic that does something more useful than import/export. but that's just my take on the situation. the hardest line to type correctly is: stty erase ^H	[reply]