mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All I am trying to figure out away to fix a text file that comes in with DOS line feeds. For the most part the lines are ok. However there are a few lines that start with ^M and end in ^M. Ending in ^M is ok. some text line here^M ^Msome for text line here What I need is a way to say: when line ends in ^M and the next line begins with ^M join the two lines and add move the ^M the the end of the line. Any suggestions or idea will be greatly appreciated. Thanks MM

Replies are listed 'Best First'.
Re: Perl Regex to Fix line feed issue
by shmem (Chancellor) on May 09, 2008 at 20:51 UTC

    What kind of file is it, actually? Smells like a csv file with multiline fields, in which those lines are separated by ^M. In any case, that smells like an XY problem.

    There's no such thing as a DOS line feed - a line feed is "\n" or ASCII 10. Carriage return is "\r" or ^M or ASCII 13. DOS line endings are "\r\n" (or CRLF). Establish what is your line ending proper (possibly "\r\n"), read the lines setting $/ to that line ending (see perlvar), then convert any (multiple) "\r" occurrences as per the specs of the task (which are those?). A few sample lines would be helpful for further advice.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Perl Regex to Fix line feed issue
by Your Mother (Archbishop) on May 09, 2008 at 20:21 UTC
    moo@cow[1]~>which rnfix rnfix: aliased to perl -pi.bk -e 's/\r\n?/\n/g'

    I find it very handy, hence the alias, but use with caution! It's fine for text files. It will break binary files. I added a .bk for the example. Take it out if you're sure you know what you're doing. (update, took out the /g, pretty sure that was just a stupid reflex; update, update: put it back, I need to lie down.)

      Thanks to all for their replies. Nothing seems to work. I am trying a different approach. if line ends in ^M and the next starts with ^M join lines. Then remove the ^M^M from middle of the line. Any thoughts?

        untested:

        $/ = "\r\n"; while (<>) { if (s/^\r//) { $l .= $_; $l =~ s/\r\r//g; next; } else { print $l; $l = $_; } } print $l;

        although I don't know what the heck you are wanting to to with what files to what end.

        Could you show what you've tried? and perhaps some sample input?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Perl Regex to Fix line feed issue
by pc88mxer (Vicar) on May 09, 2008 at 19:08 UTC
    If the carriage returns are right next to each other something like this should work:
    use File::Slurp; my $text = read_file('filename'); $text =~ s/\r\r/\r/g; print $text;
    Or you might use: $text =~ s/\r+/\r/g; if there could be multiple adjacent blank lines.

    Knowing exactly the structure of the file would help. What does od -c filename print out near those blank lines?

Re: Perl Regex to Fix line feed issue
by planetscape (Chancellor) on May 10, 2008 at 19:40 UTC