PRyanRay has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am wanting code that replaces newlines or carriage returns with the charachter \n. The following code will do this:
use strict; $_=<<'_quote_'; hai xtest "aa xx aax" baix "xx" x "axa\"x\\" xa "x\\\ \\"x" ax xbai!x _quote_ print "Original:\n", $_, "\n"; s/ ( (?: # at the beginning of the string match till inside the quotes ^(?&outside_quote) " # or continue from last match which always stops inside quotes | (?!^)\G ) (?&inside_quote) # eat things up till we find what we want ) \r?\n # the thing we want to replace ( (?&inside_quote) # eat more possibly till end of quote # if going out of quote make sure the match stops inside them # or at the end of string (?: " (?&outside_quote) (?:"|\z) )? ) (?(DEFINE) (?<outside_quote> [^"]*+ ) # just eat everything till quoting star +ts (?<inside_quote> (?:[^"\\\r\n]++|\\.)*+ ) # handle escapes ) /$1\n$2/xg; print "Replaced:\n", $_, "\n";
However, I want to be able to do this for a file that I read in (*.csv). For example, if I use the follwing to read the same file into $_, it does not work:
my $file="testdata.csv"; open(FILE, $file) or die "Can't open $file: $!\n"; select((select(FILE), $/ = undef)[0]); $_=<FILE>
Any ideas? And, no I am not able to use the Perl packages Spreadsheet::**** Here, testdata.csv is this business: hai xtest "aa xx aax" baix "xx" x "axa\"x\\" xa "x\\\ \\"x" ax xbai!x

Replies are listed 'Best First'.
Re: Regular Expressions - replacing newlines and carriage returns inside quotes
by mbethke (Hermit) on Oct 31, 2012 at 15:45 UTC

    Text::CSV is out of the question, too? I tried to do some "quick n dirty" CSV manipulation much like this recently but as usual things quickly got much more dirty than quick, I said bleep this, I'll use CPAN, and things were peachy. The main part of the resulting script looks like this:

    my $csv = Text::CSV->new({ binary => 1, sep_char => "\t", always_quote => 1, }) or do { print STDERR "Cannot initialize CSV: ", Text::CSV->error_diag, "\n +"; exit 1; }; LINE: while (my $row = $csv->getline(\*STDIN)) { s/\n/$replace/g foreach(@$row); unless($csv->combine(@$row)) { print STDERR "Error converting record $. for output: ", $csv-> +error_input, "\n"; next LINE; } print $csv->string, "\n"; } $csv->eof or $csv->error_diag;

    I tried getting your code to work but couldn't, maybe for lack of CRs in my source, but I'd doubt it covers all the subtleties CSV allows regarding quoting and escaping, let alone those it doesn't allow but you'll find anyway.

    Edit: I think we got ourselves a consensus here :-D

Re: Regular Expressions - replacing newlines and carriage returns inside quotes
by 2teez (Vicar) on Oct 31, 2012 at 15:41 UTC
      Thanks everybody, it turns out I do have the Text::CSV_XS module so I will work with this. I am locked behind some gnarly security fences so I have to reinvent the wheel quite often. Guess not this time. Thanks!
Re: Regular Expressions - replacing newlines and carriage returns inside quotes
by bitingduck (Deacon) on Oct 31, 2012 at 15:42 UTC
    Try using Text::CSV to read the file in. If it's "well formed" CSV (if there is such a thing) then the CR and LF should be between quotes and be handled properly.