splitting csv file and saving data

Ganesh Bharadwaj1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a query. I have an input CSV file that looks like this.

 NPLUS,32,0,
NW,41,0,1
NWER,51,"0,1,2",12
[download]

This is the format Layer name, layer number, data type, text type. some times there are more than 1 data type, in which case they are grouped using "".

For the first line NPLUS is the layer name,32 is layer number and 0 is the data type. There is no text type for this line

For the second line NW is the layer name,41 is layer number and 0 is the data type and 1 is the text type,

For the last line NWER is the layer name, 51 is the layer number, 0,1,2 is the data type and 12 is the text type.

I want to check the fields and print the following.

NPLUS layernumber 32 datatype 0,
NW layernumber 41 datatype 0 text 1,
NWER layernumber 51 datatype 0 datatype 1 datatype 2 text 12;
[download]

I have written a sample code, but I am not able to figure out how to split based on two delimiters and then get the required result. the last line also should have a semicolon, where as other lines should have a coma at the end only. Here is the sample code.

open(my $in, '<', 'Text_File.csv')
    or die "Cannot open Text_File.csv: $!";

open(my $out, '>', 'mask.spec')
    or die "Cannot open mask.spec: $!";

while (my $line = <$in>) { 
#my @fields = split (/(?<="),(?=")/, $line);
    my @fields = split(/,/, $line); 
    if (@fields == 4) { 
        printf {$out} "%s layernum %s datatype %s,%s", @fields; 
    } 
    elsif (@fields == 5) { 
        printf {$out} "%s layernum %s datatype %s text %s,%s", @fields
+;
    } 
}
[download]

Any help would be greatly appreciated.

Comment on splitting csv file and saving data Select or Download Code

Replies are listed 'Best First'.
Re: splitting csv file and saving data (updated) by haukex (Archbishop) on Nov 03, 2016 at 09:56 UTC
Hi Ganesh Bharadwaj1, For parsing CSV files, don't re-invent the wheel and use Text::CSV instead. That will be able to parse each of your sample rows into four fields, and then getting the comma-separated values out of the 3rd column is as easy as `my @vals = split /,/, $row->[2];` (see the module's synopsis for example code). Update: As BrowserUk pointed out below, using a simple split is only appropriate if that field is never more than a plain comma-separated list, and if there is anything more complex going on in that field (quotes, escape characters, etc.) you'll have to use a more advanced method to parse it. If you wanted to play it safe, you could validate the format of that field before splitting it, e.g. `$row->[2] =~ /^\d+(?:,\d+)*$/`. Hope this helps, -- Hauke D	[reply] [d/l] [select]
Re^2: splitting csv file and saving data by BrowserUk (Patriarch) on Nov 03, 2016 at 11:04 UTC
Text::CSV doesn't help the OP; because his data isn't CSV! No definition of CSV data allows for conditional fields. Empty yes, but not conditional. Nor do CVS modules cater to nested csv-like fields. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: splitting csv file and saving data by haukex (Archbishop) on Nov 03, 2016 at 11:26 UTC
Hi BrowserUk, I guess you are calling the third field "conditional"? The way I interpreted the input in the OP is a CSV file with four columns: layer name (string), layer number (integer), data types (string), and text type (integer). That the "data types" field was encoded as comma-separated integers in a string is certainly not an optimal design choice and it takes some manual decoding, but unless there's more the OP isn't telling us about the format, I disagree, this looks like CSV to me. And BTW, I did test before posting (mostly a copy-and-paste from Text::CSV's synopsis): `use Data::Dump 'pp'; use Text::CSV; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", "Text_File.csv" or die $!; while ( my $row = $csv->getline($fh) ) { my @vals = split /,/, $row->[2]; pp $row, @vals; } $csv->eof or $csv->error_diag(); close $fh; __END__ ([" NPLUS", 32, 0, ""], 0) (["NW", 41, 0, 1], 0) (["NWER", 51, "0,1,2", "12 "], 0, 1, 2)` [download] Regards, -- Hauke D	[reply] [d/l]
Re^4: splitting csv file and saving data by BrowserUk (Patriarch) on Nov 03, 2016 at 11:45 UTC
Re^5: splitting csv file and saving data by haukex (Archbishop) on Nov 03, 2016 at 12:52 UTC
Re^3: splitting csv file and saving data by Tux (Canon) on Nov 03, 2016 at 21:11 UTC
Humbug! :) `$ cat test.csv NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12 $ perl -MText::CSV_XS=csv -MData::Peek \ -wE'DDumper(csv(in=>"test.csv",on_in=>sub{$_[1][2]=csv(in=>\$_[1][2] +)->[0]}))' [ [ 'NPLUS', 32, [ 0 ], '' ], [ 'NW', 41, [ 0 ], 1 ], [ 'NWER', 51, [ 0, 1, 2 ], 12 ] ]` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^4: splitting csv file and saving data by BrowserUk (Patriarch) on Nov 03, 2016 at 21:53 UTC
Re^3: splitting csv file and saving data by Anonymous Monk on Nov 03, 2016 at 11:27 UTC
What conditional fields? Text::'s. Handles quoted fields and As haukex show the nesting handled by programmer	[reply]
Re: splitting csv file and saving data by BrowserUk (Patriarch) on Nov 03, 2016 at 10:48 UTC
You'll get lots of "Use XYZ::CSV module" replies despite that they don't help with your main problems, and add unnecessary complexity. This is how I would tackle the problem described: `#! perl -sl use strict; while( <DATA> ) { my( $name, $number, $type, $text ) = m[^([^,]+),(\d+),(\d+\|"[^"]+" +),(\d+)?$]; #" $type =~ s["([^"]+)"][$1]; #" printf "%s layernumber %d datatype %s%s%s\n", $name, $number, ( $type =~ m[,] ? join( ' datatype ', split ',', $type ) : $ty +pe ), ( defined( $text ) ? ' text ' . $text : '' ), ( $name eq 'NWER' ? ';' : ',' ) ; } __DATA__ NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12` [download] Produces: `C:\test>junk44 NPLUS layernumber 32 datatype 0, NW layernumber 41 datatype 0 text 1, NWER layernumber 51 datatype 0 datatype 1 datatype 2 text 12;` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re: splitting csv file and saving data by kcott (Archbishop) on Nov 03, 2016 at 22:16 UTC
G'day Ganesh Bharadwaj1, Here's how I might tackle this. I believe your input is valid CSV and would recommend use of Text::CSV. I've set up an array of transformations which match the array of elements in each CSV line. I've used eof to determine whether the output ends with a comma or a semicolon. In case you didn't know, `"\b"` is a backspace. I've used it to get rid of an extra space that would occur when there's an empty '`text`' field. The code assumes you've provided representative sample input data. Here's `pm_1175217_csv_transform.pl`: `#!/usr/bin/env perl use strict; use warnings; use Text::CSV; my @transform = ( sub { $_[0] }, sub { "layernumber $_[0]" }, sub { 'datatype ' . join ' datatype ', split /,/, $_[0] }, sub { length $_[0] ? "text $_[0]" : "\b" }, ); my $csv = Text::CSV::->new; while (my $row = $csv->getline(\*DATA)) { $row->[$_] = $transform[$_]->($row->[$_]) for 0 .. $#$row; local $\ = eof ? ";\n" : ",\n"; print "@$row"; } __DATA__ NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12` [download] A sample run: $ pm_1175217_csv_transform.pl NPLUS layernumber 32 datatype 0, NW layernumber 41 datatype 0 text 1, NWER layernumber 51 datatype 0 datatype 1 datatype 2 text 12; $ [download] — Ken	[reply] [d/l] [select]