Re: line endings in remote documents

Replies are listed 'Best First'.
Re: Re: line endings in remote documents by belg4mit (Prior) on Dec 20, 2001 at 20:54 UTC
chomp will remove $/. Which will be set to \n\r by default on windoze. For cross-platformness try something like: split/(?:\n\|\r\|(?:\r\n))/. That should cover all that I know of (Unix \n, Microsoft \r\n and Macintosh \r). UPDATE: Fixed MS line endings sigh Je suis tired pantalons `-- perl -pe "s/\b;([st])/'\1/mg"`	[reply]
Re: Re: Re: line endings in remote documents by Juerd (Abbot) on Dec 20, 2001 at 21:17 UTC
The other way around :) Mac CR \015 \x0D \r DOS CRLF \015\012 \x0D\x0A \r\n *Nix LF \012 \x0A \n (Assuming \r is chr(13) and \n is chr(10), which isn't always true) The regex to substitute them all would be `s/\cM\|\cM\cJ\|\cJ/$foo/`, which can be simplified to `s/\cM\cJ?\|\cJ/$foo/`. But if you don't need to substitute, removing can be done a lot faster by just using `tr/\cM\cJ//d` (the /d will have tr/// delete characters not found in the replacement pattern (the replacenent pattern is empty in this example)). `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l]
Re: Re: Re: Re: line endings in remote documents by belg4mit (Prior) on Dec 20, 2001 at 21:19 UTC
Except he's using the end of line characters as a means of splitting the input. `-- perl -pe "s/\b;([st])/'\1/mg"`	[reply]
Re: Re: Re: Re: Re: line endings in remote documents by Juerd (Abbot) on Dec 20, 2001 at 21:50 UTC
(tye)Re: line endings in remote documents by tye (Sage) on Dec 20, 2001 at 23:36 UTC
No, no, no!! $/ will be "\n" by default on Windows, just like it is (nearly?) everywhere else! - tye (but my friends call me "Tye")	[reply]
Re: (tye)Re: line endings in remote documents by Juerd (Abbot) on Dec 20, 2001 at 23:45 UTC
Yes, the default for `$/` is `\n` on all platforms. The wicked thing is that \n isn't the same for all platforms. If you need to be sure about what character you'll get, use `chr()`, `\x`hex, `\`oct or `\c`char. If you don't know if your data came from the same platform as your script, you can only try using a regex like `\r\n?\|\n` to match every known newline. I quote from perlop: All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on a Mac, these are reversed, and on systems without line terminator, printing "\n" may emit no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII when you need an exact character. For example, most networking protocols expect and prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they often accept just "\012", they seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned some day. `2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$` [download]	[reply] [d/l]
(tye)Re3: line endings in remote documents by tye (Sage) on Dec 21, 2001 at 00:03 UTC
Re: (tye)Re: line endings in remote documents by belg4mit (Prior) on Dec 21, 2001 at 03:21 UTC
Okay so it's set to \n, but it behaves as though it's \n\r. Afterall chomp doesn;t leave the carriage return (you'd get no visibile output!) `-- perl -pe "s/\b;([st])/'\1/mg"`	[reply]
Re: Re: Re: line endings in remote documents by premchai21 (Curate) on Dec 20, 2001 at 21:17 UTC
Or, if you're sure no \ns or \rs will appear in the middle of lines, you can do it even more simply: `split /[\n\r]+/, ...`	[reply] [d/l]