rduke15 has asked for the wisdom of the Perl Monks concerning the following question:

I needed to convert a DOS file to UNIX line endings, on Windows with Activestate Perl 5.8.7. And all of a sudden, something we've all been doing for years without even thinking about it seems broken.

perl -i.bak -pe "s/\r//g" file

doesn't work anymore, nor do any usual or unusal variants of it.

What's up? I suspect the new perlio stuff to be the source of this problem, but I don't know how to solve this.

Have tried plenty of variants, but none worked. If I try to get rid of the \x0D, it stays in the file. If I remove \x0A, both \x0D and \x0A are removed.

Other (ever sillier) variants I tried:

perl -i.bak -pe "s/\x0D\x0A/\x0A/g" file

perl -i.bak -pe "BEGIN{binmode STDOUT}; s/\x0D\x0A/\x0A/g" file

Even this stupid line (aggravated by Win32's stupid shell) doesn't work:

perl -i.bak -ne "BEGIN{binmode STDOUT}; chomp; print qq{$_\x0A};" file

Please, enlighten me. What happened to my loved Perl? Is doing simple things becoming difficult? Do I have to go back to good old Perl 5.4?

2005-10-13 Retitled by g0n, as per Monastery guidelines
Original title: 'Perl line endings: something broken in 5.8?'

Replies are listed 'Best First'.
Re: Perl line endings: something broken in ActiveState Perl 5.8?
by jmcnamara (Monsignor) on Oct 13, 2005 at 08:20 UTC

    It may help you to know that this doesn't work with previous versions of ActivePerl either. So I guess that you are mis-remembering the previous behaviour. ;-)

    The problem is that there is no \r in the line that perl sees since it is stripped off by the Windows IO libraries. Therefore you can't remove it. And when the line is written back out the \n is replaced with \r\n (again by the IO libraries) so the \r is still in the output file.

    You can convince yourself of this with the following one-liner which won't print out any of the lines in file.

    perl -ne "print if /\r/" file.txt
    The following will work when directed to another file:
    perl -pe "BEGIN{binmode *STDOUT}" file.txt > file2.txt

    This won't work with -i however, probably due to where the binmode occurs in relation to other internal actions on the input and back-up file.

    --
    John.

      Amazing! Yes, you are right. Searching on my hard drive, I also found ActivePerl 5.003_07, the non-Activestate GS Perl 5.004_02 , and even the DOS port 5.003_93. None of them seem to work, even binmode'ing everything.

      perl -i.bak -ne "BEGIN{binmode STDIN; binmode STDOUT}; s/\x0D//;"

      It is true that I mostly used this in Linux, but I'm still surprised that I would never have noticed after all these years. Has this never been reported as a bug?
        You can get Perl to see the \r on input by setting the PERLIO environment variable to :raw.
        set PERLIO=:raw perl -ne "print if /\r/" file.txt [displays lines]
        This also allows you to write to text files without getting \r inserted. But it still doesn't solve the -i problem. Which kinda looks like a bug.
Re: Perl line endings: something broken in ActiveState Perl 5.8?
by Delusional (Beadle) on Oct 13, 2005 at 09:35 UTC
    For Windows, there is a small freeware utility called unix2dos.exe and dos2unix.exe. Using the utility you can convert to the specific format. These programs are relatively easy to find using your favourite search engine. I only offer this in the event you continue having problems getting the conversions right.
      There is also a CPAN module I wrote, Text::FixEOL, that was specifically designed to fix EOL problems.
Re: Perl line endings: something broken in ActiveState Perl 5.8?
by Skeeve (Parson) on Oct 13, 2005 at 07:17 UTC
    Qutoe from perldoc perlipc:

    Internet Line Terminators

    The Internet line terminator is "\015\012". Under ASCII variants of Unix, that could usually be written as "\r\n", but under other systems, "\r\n" might at times be "\015\015\012", "\012\012\015", or something completely different. The standards specify writing "\015\012" to be conformant (be strict in what you provide), but they also recommend accepting a lone "\012" on input (but be lenient in what you require). We haven't always been very good about that in the code in this man-page, but unless you're on a Mac, you'll probably be ok.

    Okay... Your question is not IPC-related, but after reading that, Iusually remove line endings with

    s/[\015\012]+//g;
    So in order to replace them with UNIX Style line endings, this should do:
    s/[\015\012]+/\015/g;
    Attention! If you have more than one consecutive line ending in your string, they will be reduced to 1...

    Update: The problem was not data related but really (Activestate-)Perl related. Sorry for wasting your time ;-)


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      this should do:

      Yes it should, but no it doesn't! That's why I came here.

        Ever tried a hexdump of that data? This usually helps me isolating the problem if it's data-related.

        Update: Delete this node. Just noticed after posting that the problem was not data related but really (Activestate-)Perl related. Sorry for wasting your time ;-)


        s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
        +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e