in reply to Re: splitting csv file and saving data (updated)
in thread splitting csv file and saving data

Text::CSV doesn't help the OP; because his data isn't CSV! No definition of CSV data allows for conditional fields. Empty yes, but not conditional.

Nor do CVS modules cater to nested csv-like fields.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^2: splitting csv file and saving data

Replies are listed 'Best First'.
Re^3: splitting csv file and saving data
by haukex (Archbishop) on Nov 03, 2016 at 11:26 UTC

    Hi BrowserUk,

    I guess you are calling the third field "conditional"? The way I interpreted the input in the OP is a CSV file with four columns: layer name (string), layer number (integer), data types (string), and text type (integer). That the "data types" field was encoded as comma-separated integers in a string is certainly not an optimal design choice and it takes some manual decoding, but unless there's more the OP isn't telling us about the format, I disagree, this looks like CSV to me.

    And BTW, I did test before posting (mostly a copy-and-paste from Text::CSV's synopsis):

    use Data::Dump 'pp'; use Text::CSV; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); open my $fh, "<:encoding(utf8)", "Text_File.csv" or die $!; while ( my $row = $csv->getline($fh) ) { my @vals = split /,/, $row->[2]; pp $row, @vals; } $csv->eof or $csv->error_diag(); close $fh; __END__ ([" NPLUS", 32, 0, ""], 0) (["NW", 41, 0, 1], 0) (["NWER", 51, "0,1,2", "12 "], 0, 1, 2)

    Regards,
    -- Hauke D

      unless there's more the OP isn't telling us about the format, I disagree,

      Please note: I didn't say it could not be solved with a cvs module, only that they don't really help.

      For example, you've resorted to split for the embedded field. Why it it acceptable to do so there, but not for the rest?

      But, your code also only solves half the problem; neatly side-stepping the issue of "the last line also should have a semicolon, where as other lines should have a coma at the end only.".

      Whilst I'm quite certain that you can solve that too; the use of a csv module does nothing to assist in that. In fact it effectively denies access to some information that could be used to assist in the production of the output. (Ie. You unconditionally attempt to split the 3rd field regardless of whether it requires it.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Hi BrowserUk,

        Please note: I didn't say it could not be solved with a cvs module, only that the don't really help.

        I was disagreeing mostly with what appeared to be your main point, "his data isn't CSV!" I also disagree that a CSV module doesn't help, since Text::CSV gives you the handling of quoted strings. I'm sure you know all the other usual arguments as to why using a module is often better than rolling your own, but just to highlight one: it does give more flexibility if the input happens to vary (e.g. if one of the fields is empty or another field happens to be a quoted string).

        For example, you've resorted to split for the embedded field. Why it it accesptable to do so there, but not for the rest?

        I'd say a simple split isn't really appropriate for the rest of the line because it doesn't have all the power of the module. However, you've still got a good point, using a simple split for the third column does make assumptions about the input format (I've updated my node accordingly). If I wanted total consistency I could use another instance of Text::CSV, but the line between what's overkill and what isn't has to be drawn somewhere ;-)

        But, your code also only solves half the problem

        Yes, I admittedly did skip that part of the question; I felt that the OP's problems with parsing CSV were more important. But then again, your code doesn't solve that part very generically ;-P

        ... the use of a csv module does nothing to assist in that. In fact it effectively denies access to some information that could be used to assist in the production of the output.

        True, one does lose some info on the input (IIRC the physical line number in the input file in the case of newlines embedded in fields; maybe there's more I'm forgetting at the moment). But I also think the solution for the problem of adding either a comma or a semicolon at the end of the line is the same whether I manually parse the rows or whether I use a module to help. Just a quick, inelegant idea:

        my $rownum = 1; while ( my $data = somehow_parse_input_line() ) { print ",\n" unless $rownum==1; print $data; } continue { $rownum++ } print ";\n";
        You unconditionally attempt to split the 3rd field regardless of whether it requires it.

        Unless we're talking about performance, I don't think unconditionally splitting the third field hurts; although as I said above it could certainly be solved differently, and that'd certainly be necessary if the input varies.

        Regards,
        -- Hauke D

Re^3: splitting csv file and saving data
by Tux (Canon) on Nov 03, 2016 at 21:11 UTC

    Humbug! :)

    $ cat test.csv NPLUS,32,0, NW,41,0,1 NWER,51,"0,1,2",12 $ perl -MText::CSV_XS=csv -MData::Peek \ -wE'DDumper(csv(in=>"test.csv",on_in=>sub{$_[1][2]=csv(in=>\$_[1][2] +)->[0]}))' [ [ 'NPLUS', 32, [ 0 ], '' ], [ 'NW', 41, [ 0 ], 1 ], [ 'NWER', 51, [ 0, 1, 2 ], 12 ] ]

    Enjoy, Have FUN! H.Merijn

      ++ That is a perfect rebuttal to my incorrect assertion; and very impressive.

      But, I have questions:

      • How many people in the world, besides you, could have picked out the appropriate 2 or 3 lines from your amazingly detailed and comprehensive 1700 lines of module documentation to construct that solution?
      • And when if things go wrong -- say some of those weird MS Word double quotes characters had gotten mixed into the OPs data -- how many people would be qualified to try and diagnose the problem?

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: splitting csv file and saving data
by Anonymous Monk on Nov 03, 2016 at 11:27 UTC
    What conditional fields? Text::'s. Handles quoted fields and As haukex show the nesting handled by programmer