Ganesh Bharadwaj1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a query. I have an input CSV file that looks like this.
NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12
This is the format Layer name, layer number, data type, text type. some times there are more than 1 data type, in which case they are grouped using "".

For the first line NPLUS is the layer name,32 is layer number and 0 is the data type. There is no text type for this line

For the second line NW is the layer name,41 is layer number and 0 is the data type and 1 is the text type,

For the last line NWER is the layer name, 51 is the layer number, 0,1,2 is the data type and 12 is the text type.

I want to check the fields and print the following.
NPLUS layernumber 32 datatype 0, NW layernumber 41 datatype 0 text 1, NWER layernumber 51 datatype 0 datatype 1 datatype 2 text 12;
I have written a sample code, but I am not able to figure out how to split based on two delimiters and then get the required result. the last line also should have a semicolon, where as other lines should have a coma at the end only. Here is the sample code.
open(my $in, '<', 'Text_File.csv') or die "Cannot open Text_File.csv: $!"; open(my $out, '>', 'mask.spec') or die "Cannot open mask.spec: $!"; while (my $line = <$in>) { #my @fields = split (/(?<="),(?=")/, $line); my @fields = split(/,/, $line); if (@fields == 4) { printf {$out} "%s layernum %s datatype %s,%s", @fields; } elsif (@fields == 5) { printf {$out} "%s layernum %s datatype %s text %s,%s", @fields +; } }
Any help would be greatly appreciated.

Replies are listed 'Best First'.
Re: splitting csv file and saving data (updated)
by haukex (Archbishop) on Nov 03, 2016 at 09:56 UTC

    Hi Ganesh Bharadwaj1,

    For parsing CSV files, don't re-invent the wheel and use Text::CSV instead. That will be able to parse each of your sample rows into four fields, and then getting the comma-separated values out of the 3rd column is as easy as my @vals = split /,/, $row->[2]; (see the module's synopsis for example code).

    Update: As BrowserUk pointed out below, using a simple split is only appropriate if that field is never more than a plain comma-separated list, and if there is anything more complex going on in that field (quotes, escape characters, etc.) you'll have to use a more advanced method to parse it. If you wanted to play it safe, you could validate the format of that field before splitting it, e.g. $row->[2] =~ /^\d+(?:,\d+)*$/.

    Hope this helps,
    -- Hauke D

      Text::CSV doesn't help the OP; because his data isn't CSV! No definition of CSV data allows for conditional fields. Empty yes, but not conditional.

      Nor do CVS modules cater to nested csv-like fields.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Hi BrowserUk,

        I guess you are calling the third field "conditional"? The way I interpreted the input in the OP is a CSV file with four columns: layer name (string), layer number (integer), data types (string), and text type (integer). That the "data types" field was encoded as comma-separated integers in a string is certainly not an optimal design choice and it takes some manual decoding, but unless there's more the OP isn't telling us about the format, I disagree, this looks like CSV to me.

        And BTW, I did test before posting (mostly a copy-and-paste from Text::CSV's synopsis):

        use Data::Dump 'pp'; use Text::CSV; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", "Text_File.csv" or die $!; while ( my $row = $csv->getline($fh) ) { my @vals = split /,/, $row->[2]; pp $row, @vals; } $csv->eof or $csv->error_diag(); close $fh; __END__ ([" NPLUS", 32, 0, ""], 0) (["NW", 41, 0, 1], 0) (["NWER", 51, "0,1,2", "12 "], 0, 1, 2)

        Regards,
        -- Hauke D

        Humbug! :)

        $ cat test.csv NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12 $ perl -MText::CSV_XS=csv -MData::Peek \ -wE'DDumper(csv(in=>"test.csv",on_in=>sub{$_[1][2]=csv(in=>\$_[1][2] +)->[0]}))' [ [ 'NPLUS', 32, [ 0 ], '' ], [ 'NW', 41, [ 0 ], 1 ], [ 'NWER', 51, [ 0, 1, 2 ], 12 ] ]

        Enjoy, Have FUN! H.Merijn
        What conditional fields? Text::'s. Handles quoted fields and As haukex show the nesting handled by programmer
Re: splitting csv file and saving data
by BrowserUk (Patriarch) on Nov 03, 2016 at 10:48 UTC

    You'll get lots of "Use XYZ::CSV module" replies despite that they don't help with your main problems, and add unnecessary complexity.

    This is how I would tackle the problem described:

    #! perl -sl use strict; while( <DATA> ) { my( $name, $number, $type, $text ) = m[^([^,]+),(\d+),(\d+|"[^"]+" +),(\d+)?$]; #" $type =~ s["([^"]+)"][$1]; #" printf "%s layernumber %d datatype %s%s%s\n", $name, $number, ( $type =~ m[,] ? join( ' datatype ', split ',', $type ) : $ty +pe ), ( defined( $text ) ? ' text ' . $text : '' ), ( $name eq 'NWER' ? ';' : ',' ) ; } __DATA__ NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12

    Produces:

    C:\test>junk44 NPLUS layernumber 32 datatype 0, NW layernumber 41 datatype 0 text 1, NWER layernumber 51 datatype 0 datatype 1 datatype 2 text 12;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: splitting csv file and saving data
by kcott (Archbishop) on Nov 03, 2016 at 22:16 UTC

    G'day Ganesh Bharadwaj1,

    Here's how I might tackle this.

    • I believe your input is valid CSV and would recommend use of Text::CSV.
    • I've set up an array of transformations which match the array of elements in each CSV line.
    • I've used eof to determine whether the output ends with a comma or a semicolon.
    • In case you didn't know, "\b" is a backspace. I've used it to get rid of an extra space that would occur when there's an empty 'text' field.
    • The code assumes you've provided representative sample input data.

    Here's pm_1175217_csv_transform.pl:

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV; my @transform = ( sub { $_[0] }, sub { "layernumber $_[0]" }, sub { 'datatype ' . join ' datatype ', split /,/, $_[0] }, sub { length $_[0] ? "text $_[0]" : "\b" }, ); my $csv = Text::CSV::->new; while (my $row = $csv->getline(\*DATA)) { $row->[$_] = $transform[$_]->($row->[$_]) for 0 .. $#$row; local $\ = eof ? ";\n" : ",\n"; print "@$row"; } __DATA__ NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12

    A sample run:

    $ pm_1175217_csv_transform.pl NPLUS layernumber 32 datatype 0, NW layernumber 41 datatype 0 text 1, NWER layernumber 51 datatype 0 datatype 1 datatype 2 text 12; $

    — Ken