serene_monk has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks

I am working on csv file , viewing as excel . so i am splitting it by comma for formatting . but the problem is when the data itself has comma in it . is there any way to overcome the effect of this comma

Replies are listed 'Best First'.
Re: CSV SPLIT
by kcott (Archbishop) on Jun 21, 2013 at 05:11 UTC

    G'day serene_monk,

    Welcome to the monastery.

    "I am working on csv file , viewing as excel . so i am splitting it by comma for formatting . but the problem is when the data itself has comma in it . is there any way to overcome the effect of this comma"

    Yes, there certainly is. Take a look at Text::CSV.

    -- Ken

      Thank you for replying

      hey the thing is , my code wants to work on formatting the structure of the csv data , so I need to split it into words and with help of some specific words I have to do the formatting . I have each column of csv(as seen by excel ) in an array , via split function . this works absolutely fine when there is no user input ,

      I think this cpan , only aims to convert csv to text directly

        As kcott posted, there are almost no circumstances where you should use split to retrieve or store CSV data due to problems similar to those you've encountered. Text::CSV and its close relatives will happily get/put data between a CSV file and an array, and deal with the separators in data properly by quoting them. Once the row is in the array you can process it anyway you like.

        If you post some example data and what you'd like to happen to it, perhaps we can give you a more detailed answer. Don't forget to use <code> tags to show your data in the question properly

        If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

        I concur with the advice space_monk has given you. We get these types of questions every day or two: search in Super Search for Text::CSV or, if you prefer, here's one I prepared earlier :-)

        Some additional information:

        • CPAN (uppercase) stands for Comprehensive Perl Archive Network. It's a searchable repository of Perl modules.
        • cpan (lowercase) is a command line utility for interacting with CPAN. You get this for free: it's installed when you install Perl. When you write "cpan" in your posts, this is what people will think you are referring to.
        • When you wish to refer to a CPAN module, you should do so by name. It's helpful to provide a link to the module in question, as I did in my original reply to you. There's Shortcuts for doing this.
        • Don't reply to yourself as you did with (Re^2) "... csv to text directly" (Re^3) "and i was thinking ..."; just edit your post as explained in "How do I change/delete my post?".
        • In order to get the best possible responses to your questions, follow the guidelines in "How do I post a question effectively?".

        -- Ken

        and i was thinking is it possible to convert comma to any special char while reading and change it back to comma while printing

Re: CSV SPLIT
by locked_user sundialsvc4 (Abbot) on Jun 21, 2013 at 13:06 UTC

    Serene, Text::CSV is simply the first step:   the one that will effortlessly and correctly (let you) deal with niggling details such as commas within your comma-separated values.   It will correctly produce a Perl data structure corresponding to it.

    Baby steps first.   Find examples of Text::CSV and Data::Dumper.   Complete your program to the point where it successfully parses your CSV file and then dumps a correct data-structure so that you can look at it.   That is step one.   Cross that bridge, and prove that you have crossed it.   Next, deal with the separate issue of transforming that data structure into the output that you need.

Re: CSV SPLIT
by Anonymous Monk on Sep 26, 2014 at 22:08 UTC

    Sadly I had a case a couple years ago where the data had commas and quotes and probably other characters (possibly embedded cr/lf in a file which split lines and had continued on the next line markers and probably quoting inconsistencies and other weird stuff) and as a result the various Text::CSV variations did not work and I do not think I figured out what the problem was. I do not remember if the code crashed or just returned the wrong result. I ended up writing a pure perl routine (mostly brute force things but also used index and substr optimizations) to do the split and the problem was solved (with a fair amount of pain). it also seems to work handling csv files from Excel Worksheets (at least the ones I work with). I guess the lesson (for me at least) is that sometimes you have to grow your own solution.

    I may have tried other splitting routines that were suggested at various sites without any luck - I just cannot remember.

    maybe someday I will go back to the code and retry the processing with the Text::CSV code to see what the problem is so that (maybe) the Text::CSV code can be fixed. at that time I may include my code. I check these things I respond to periodically (not daily - more like every couple weeks) so if someone expresses some interest I may post the code I used before I retest.

      I may now have more information about the problem I experienced a couple years ago. I now think that the problem I ran into was running out of memory under Cygwin (not positive) when processing a large number of files. I just ran a test under Linux to try to recreate the problem and ran out of memory when the code snippet below was inside a subroutine. if I moved the "my" line outside the subroutine there was no memory issue. Windows does not appear to have the same problem (memory usage remains steady). but it also looks like The Text::CSV code may be handling the split of the data I am using quite right - it looks like it has trouble with the combination of quoted fields and escape characters and probably escaped " characters in particular (not sure - still testing). but my main goal here is to alert people of the possible memory leak issue - Cygwin (32 bit) and Linux (32 bit) may have a problem and there may be others. I have reported a bug on CPAN.

      my $CSV = Text::CSV_XS->new ({binary => 1, escape_char => "\\"}); + # need binary and change escape character if ($CSV->parse($line_to_split))

        but my main goal here is to alert people of the possible memory leak issue - Cygwin (32 bit) and Linux (32 bit) may have a problem and there may be others. I have reported a bug on CPAN.

        Bug #100024 for Text-CSV_XS: possible memory leak doesn't rise to the level of a bug report, not enough details .... and the details provided hint its a problem in your code not Text::CSV_XS .... also its also talking about a very old version of Text::CSV_XS from four five years ago