in reply to Re: Clean data - where field contains a CRLF
in thread Clean data - where field contains a CRLF

Minor nitpick, Grampa:
# s/\r//g; # chomp; # expressed better (less platform dependent) as: s/[\r\n]+//g; # or, to be compulsive, use the numerics: s/[\x0a\x0d]+//g;
According to the perl docs I've seen, chomp "removes any trailing string that corresponds to the current value of $/".

If perl has $/ set to "\r\n", taking away the "\r" before chomping might cause the chomp to do nothing at all. (But I'm not a windows user, so I could be wrong about that.)

Also, depending on the data and the task, it might make more sense to replace every [\r\n]+ with a space, rather than an empty string, esp. if consecutive lines will be concatenated into a single string.

Replies are listed 'Best First'.
Re^3: Clean data - where field contains a CRLF
by GrandFather (Saint) on Aug 21, 2006 at 02:37 UTC

    Possibly a Mac issue, but not a Windows issue. Perl's IO processing will already have converted CRLF to \n under Windows. The code I posted was tested using Windows.

    However I agree that your regex solution is likely to be better. I'd avoid the "numeric" version though. That makes it more, rather than less, sensitive to OS and character sets.

    Perl converts native line ends to \n (which may or may not be an actual new line character), and sets $/ to \n by default so it doesn't matter what the native OS line end convention is and it doesn't matter what character encoding is used - \n procesing using non-binary mode I/O should be portable with Perl.


    DWIM is Perl's answer to Gödel