in reply to CSV SPLIT

Sadly I had a case a couple years ago where the data had commas and quotes and probably other characters (possibly embedded cr/lf in a file which split lines and had continued on the next line markers and probably quoting inconsistencies and other weird stuff) and as a result the various Text::CSV variations did not work and I do not think I figured out what the problem was. I do not remember if the code crashed or just returned the wrong result. I ended up writing a pure perl routine (mostly brute force things but also used index and substr optimizations) to do the split and the problem was solved (with a fair amount of pain). it also seems to work handling csv files from Excel Worksheets (at least the ones I work with). I guess the lesson (for me at least) is that sometimes you have to grow your own solution.

I may have tried other splitting routines that were suggested at various sites without any luck - I just cannot remember.

maybe someday I will go back to the code and retry the processing with the Text::CSV code to see what the problem is so that (maybe) the Text::CSV code can be fixed. at that time I may include my code. I check these things I respond to periodically (not daily - more like every couple weeks) so if someone expresses some interest I may post the code I used before I retest.

Replies are listed 'Best First'.
Re^2: CSV SPLIT
by Anonymous Monk on Nov 04, 2014 at 16:35 UTC

    I may now have more information about the problem I experienced a couple years ago. I now think that the problem I ran into was running out of memory under Cygwin (not positive) when processing a large number of files. I just ran a test under Linux to try to recreate the problem and ran out of memory when the code snippet below was inside a subroutine. if I moved the "my" line outside the subroutine there was no memory issue. Windows does not appear to have the same problem (memory usage remains steady). but it also looks like The Text::CSV code may be handling the split of the data I am using quite right - it looks like it has trouble with the combination of quoted fields and escape characters and probably escaped " characters in particular (not sure - still testing). but my main goal here is to alert people of the possible memory leak issue - Cygwin (32 bit) and Linux (32 bit) may have a problem and there may be others. I have reported a bug on CPAN.

    my $CSV = Text::CSV_XS->new ({binary => 1, escape_char => "\\"}); + # need binary and change escape character if ($CSV->parse($line_to_split))

      but my main goal here is to alert people of the possible memory leak issue - Cygwin (32 bit) and Linux (32 bit) may have a problem and there may be others. I have reported a bug on CPAN.

      Bug #100024 for Text-CSV_XS: possible memory leak doesn't rise to the level of a bug report, not enough details .... and the details provided hint its a problem in your code not Text::CSV_XS .... also its also talking about a very old version of Text::CSV_XS from four five years ago