Karim has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,

I hope this is the right place to ask a newbie question! Apologies if I'm mistaken.

I am faced with a seemingly trivial problem. How to remove newline characters in the middle of a column and keep the new line(and potential carriage return) at the end of my record?

After many attempts at using perl regexes, I am still stuck! Here is a sample file: 4 columns, 2 rows, delim=';'

Input:

cofathec-korb.be;MX;mailbackup.ops.be;3600 cofathec-korb.be;MX;"mx1.energy-services.be address1 address2;3600

Desired output:

cofathec-korb.be;MX;mailbackup.ops.be;3600 cofathec-korb.be;MX;"mx1.energy-services.be address1 address2;3600

Any help would be hugely appreciated.

Cheers
Karim

Replies are listed 'Best First'.
Re: delimited file with multiline columns.
by haukex (Archbishop) on Dec 20, 2016 at 11:24 UTC
Re: delimited file with multiline columns.
by kcott (Archbishop) on Dec 21, 2016 at 07:03 UTC

    G'day Karim,

    Welcome to the Monastery.

    Firstly, I concur with ++haukex' response: Text::CSV is the appropriate tool for this task.

    "After many attempts at using perl regexes, I am still stuck!"

    In some other situation, where you simply want to convert a multi-line string to a single-line string, and preserve line-endings if they exist, you can use this regex:

    s/\R(?!\z)//gm

    Here's a quick command line test:

    $ perl -E 'my @x = ("a\nb\nc\n", "a\nb\nc", "abc\n", "abc"); for (0 . +. $#x) { say "$_=|$x[$_]|"; $x[$_] =~ s/\R(?!\z)//gm; say "$_=|$x[$_] +|" }' 0=|a b c | 0=|abc | 1=|a b c| 1=|abc| 2=|abc | 2=|abc | 3=|abc| 3=|abc|

    See also: "perlrebackslash: \R"; "perlre: Lookaround Assertions"; perlre.

    Caveat: \R was introduced in v5.10.0: you'll need at least that version of Perl.

    — Ken

Re: delimited file with multiline columns.
by soonix (Chancellor) on Dec 26, 2016 at 08:17 UTC
    In your data, I see only an opening quote, but no closing quote. Is that in fact, or is it just a "typed-in-manually-instead-of-copy-pasting" error?