in reply to Scrubbing XML

On some unix systems you could pass the file through the dos2unix facility, e.g.
dos2unix < foo.xml > fooOK.xml
If that is missing, or if it still doesn't work, I'd try a hardcoded (into binary) version of the regexp, e.g.:
for my $hardcoded ( chr(13) . chr(10), chr(10) . chr(13)) { s/$hardcoded//g: }

One world, one people

Replies are listed 'Best First'.
Re^2: Scrubbing XML
by the.duck (Novice) on Apr 18, 2011 at 16:12 UTC

    Well the problem with dos2unix or anything that just removes the carriage return is that I'm left with extra line feeds. I need to when a I see a <CR> also remove the <LF> without removing all the other <LF>'s.

      I anticipated that, hence the hardcoded regexp idea, but I just remembered something else -- you might need to set $/ = undef() as well as the hardcoded regexp, to prevent the CR and LF being split across a line break.

      Update: and if using perl -ne, that would have to be done in a BEGIN{ } block

      One world, one people