in reply to splitting csv file and saving data

Hi Ganesh Bharadwaj1,

For parsing CSV files, don't re-invent the wheel and use Text::CSV instead. That will be able to parse each of your sample rows into four fields, and then getting the comma-separated values out of the 3rd column is as easy as my @vals = split /,/, $row->[2]; (see the module's synopsis for example code).

Update: As BrowserUk pointed out below, using a simple split is only appropriate if that field is never more than a plain comma-separated list, and if there is anything more complex going on in that field (quotes, escape characters, etc.) you'll have to use a more advanced method to parse it. If you wanted to play it safe, you could validate the format of that field before splitting it, e.g. $row->[2] =~ /^\d+(?:,\d+)*$/.

Hope this helps,
-- Hauke D

Replies are listed 'Best First'.
Re^2: splitting csv file and saving data
by BrowserUk (Patriarch) on Nov 03, 2016 at 11:04 UTC

    Text::CSV doesn't help the OP; because his data isn't CSV! No definition of CSV data allows for conditional fields. Empty yes, but not conditional.

    Nor do CVS modules cater to nested csv-like fields.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi BrowserUk,

      I guess you are calling the third field "conditional"? The way I interpreted the input in the OP is a CSV file with four columns: layer name (string), layer number (integer), data types (string), and text type (integer). That the "data types" field was encoded as comma-separated integers in a string is certainly not an optimal design choice and it takes some manual decoding, but unless there's more the OP isn't telling us about the format, I disagree, this looks like CSV to me.

      And BTW, I did test before posting (mostly a copy-and-paste from Text::CSV's synopsis):

      use Data::Dump 'pp'; use Text::CSV; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", "Text_File.csv" or die $!; while ( my $row = $csv->getline($fh) ) { my @vals = split /,/, $row->[2]; pp $row, @vals; } $csv->eof or $csv->error_diag(); close $fh; __END__ ([" NPLUS", 32, 0, ""], 0) (["NW", 41, 0, 1], 0) (["NWER", 51, "0,1,2", "12 "], 0, 1, 2)

      Regards,
      -- Hauke D

        unless there's more the OP isn't telling us about the format, I disagree,

        Please note: I didn't say it could not be solved with a cvs module, only that they don't really help.

        For example, you've resorted to split for the embedded field. Why it it acceptable to do so there, but not for the rest?

        But, your code also only solves half the problem; neatly side-stepping the issue of "the last line also should have a semicolon, where as other lines should have a coma at the end only.".

        Whilst I'm quite certain that you can solve that too; the use of a csv module does nothing to assist in that. In fact it effectively denies access to some information that could be used to assist in the production of the output. (Ie. You unconditionally attempt to split the 3rd field regardless of whether it requires it.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Humbug! :)

      $ cat test.csv NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12 $ perl -MText::CSV_XS=csv -MData::Peek \ -wE'DDumper(csv(in=>"test.csv",on_in=>sub{$_[1][2]=csv(in=>\$_[1][2] +)->[0]}))' [ [ 'NPLUS', 32, [ 0 ], '' ], [ 'NW', 41, [ 0 ], 1 ], [ 'NWER', 51, [ 0, 1, 2 ], 12 ] ]

      Enjoy, Have FUN! H.Merijn

        ++ That is a perfect rebuttal to my incorrect assertion; and very impressive.

        But, I have questions:

        • How many people in the world, besides you, could have picked out the appropriate 2 or 3 lines from your amazingly detailed and comprehensive 1700 lines of module documentation to construct that solution?
        • And when if things go wrong -- say some of those weird MS Word double quotes characters had gotten mixed into the OPs data -- how many people would be qualified to try and diagnose the problem?

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.
      What conditional fields? Text::'s. Handles quoted fields and As haukex show the nesting handled by programmer