perrin has asked for the wisdom of the Perl Monks concerning the following question:

Here's an annoying one. The quirky but uber-fast Text::CSV_XS module can handle newlines embedded in fields if you turn on binary mode (binary => 1). MySQL's SELECT INTO OUTFILE and LOAD DATA INFILE can also handle embedded newlines. However, MySQL wants to prefix them with the escape character (a '\' in my case), and Text::CSV_XS chokes on that.

After fiddling with the options for both, I can't find a way to make them agree. It looks like my only option might be to s{\\n}{\n}g on the way out of MySQL and then reverse that on the way back in after processing with Text::CSV_XS. That is likely to hurt performance, since I am using the getline() and print() methods in Text::CSV_XS and I would have to switch to reading and printing the lines myself. (Reading the lines and dealing with embedded newlines myself? Ick!)

If anyone has any suggestions, I would appreciate them. Here's a (reduced) sample of the code I'm using:

use Text::CSV_XS; use IO::Handle; use Carp qw(croak); my $file = $ARGV[0]; my $parser = Text::CSV_XS->new({ binary => 1, always_quote => 1, escape_char => '\\', #' help emacs }); # open input and output temp files open(my $in_fh, '<', $file) or croak("Unable to open input file $!"); my $line_num = 0; # Using while(1) here and exiting via last() so we can avoid testing t +o see # if it's the last line (with eof) more than once per line. while (not eof($in_fh)) { $line_num++; # parse the row and handle any errors my $in_values = $parser->getline($in_fh); croak("Unable to parse line $line_num of file $file: " . $parser-> +error_input()) if !$in_values; } close $in_fh or croak("Error closing file $file: $!");

And the SQL looks like this:

SELECT blah blah blah INTO OUTFILE ? FIELDS TERMINATED BY ',' ESCAPED BY '\\' ENCLOSED BY '"' LINES TERMINATED BY '\n'

Replies are listed 'Best First'.
Re: Text::CSV_XS and MySQL newline handling
by radiantmatrix (Parson) on Nov 17, 2005 at 19:03 UTC

    You don't have to do as much as you think by way of doing reading and writing yourself, since Text:CSV_XS's getline returns an arrayref that you can manipulate.

    # parse the row and handle any errors my $in_values = $parser->getline($in_fh); croak("Unable to parse line $line_num of file $file: " . $parser->error_input()) if !$in_values; #just add... for (@$in_values) { s:\\\\n:\n:gso }

    With the addition of one line, you've unescaped your newlines. And, of course, to put them back, you'd reverse the substitution right before your $csv->print. Ideal? No. But it works, and the performance hit is negligible.

    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law
      It's not that simple. Text::CSV_XS will not parse the output the way MySQL generates it. It can't read the line with the escaped newline in it. That means I have to read the lines myself, dealing with possible embedded newlines, and feed them to $parser->parse().

      The output part is also tricky because I would have to escape the newlines AFTER Text::CSV_XS generates the line. If I do it before, then it will escape my escape character when I call print() or combine().

Re: Text::CSV_XS and MySQL newline handling
by jZed (Prior) on Nov 17, 2005 at 21:12 UTC
    Ugly, but what if your SQL was: "LINES TERMINATED BY '\\n'", then the extra escape backslash will cancel out the extra backslash in the terminator. Two wrongs make a right?

    update I need coffee, sorry the terminating newlines are not the ones the escape works on. Any chance a different escape would work for you?

      I don't think that will help. Text::CSV_XS specifically wants some special characters escaped (e.g. ") but not newline. The problem is that they don't agree on what characters should be escaped.