Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Clean line ends

by Scarborough (Hermit)
on Jan 19, 2007 at 09:59 UTC ( [id://595426] : perlquestion . print w/replies, xml ) Need Help??

Scarborough has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem with text files being passed between systems. I have a text file from a COBOL program which is t ransfered to a legacy datastore. The COBOL guy is not willing to look at it but the line endings are shot meaning the datastore rejects it. Looking in an editor which displays line end charecters I have found this
SOME DATA\cr\lf \lf MORE DATA\ccr\lf \lf Yet more data\cr\lf
I need to replace the \lf characters with \cr\lf on the empty lines. Any ideas?

Replies are listed 'Best First'.
Re: Clean line ends
by davorg (Chancellor) on Jan 19, 2007 at 10:14 UTC

    Look for lines that start with a line feed (ctrl-J) and convert it to a carriage return (ctrl-M) followed by a line feed.

    The following is a command line script to do it. It edits your file in place and renames the original version with a .bak extension.

    perl -i.bak -pe 's/^\cJ/\cM\cJ/' your_file_name_here

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Clean line ends
by shmem (Chancellor) on Jan 19, 2007 at 10:24 UTC
    Replace all lfs not preceeded by cr with cr+lf:
    perl -pi.bak -e 's/(?<!\r)\n/\r\n/' cobolfile

    See perlre.


    update: as davorg pointed out, it is better to use control chars for newline conversions. The look-behind assertion above replaces bogus newlines elsewhere, not only at blank lines.

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      I'm sure that works.

      But I find I sometimes get confused about how Perl inteprets \n (sometimes it's \012 and sometimes it's \015\012 - see perlport for the details). In order to minimise my confusion I find it helpful to always refer to the control characters (\cJ and \cM) or the numeric character codes (\012 and \015) when I'm converting newline characters.


      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        I also found David Cross' explanation of differing line end chars very helpful. See his book "Data Munging w/Perl", page 88-89. The topic is very well explained and it was an eye-opener to me when I was confused about new lines for different platforms. Sorry to point you to a book but I think David's book is excellent and his coverage of this particular topic is helpful.