question about "split" function

rimvydazas has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: question about "split" function by roboticus (Chancellor) on Dec 11, 2007 at 21:35 UTC
If you don't want to be hassled by line endings (Mac / Win / Unix) you might try: `split /[\r\n]+/, $x;` [download] So it'll handle a set of carriage returns and newlines no matter which order they appear in. Do note, however, that if you want to preserve information on blank lines, that this won't work. It'll happily gobble up sequential end-of-line sequences... ...roboticus	[reply] [d/l]
Re: question about "split" function by kyle (Abbot) on Dec 11, 2007 at 21:46 UTC
You probably want to use "`\015\012`" for CR/LF since the values of `\r` and `\n` vary from system to system (`\f` is a form feed). `my @lines = split /\015\012/, $x;` [download]	[reply] [d/l] [select]
Re^2: question about "split" function (magic numbers) by tye (Sage) on Dec 11, 2007 at 23:06 UTC
You probably want to use "\015\012" for CR/LF since the values of \r and \n vary from system to system Bleh. I prefer to not let the mistakes of a single platform sway me away from previous good practices in writing portable code. The only platform where "\r" and "\n" caused any concern was old Macs. There are a lot more non-ASCII platforms where hard-coding magic numbers would cause problems. Granted, there was a period where running into what is now an "old" Mac Perl was much, much more likely than running into a non-ASCII Perl (at least for most people). It is a bit sad, however, that this visibility caused so many people to just overlook the long-standing realization that hard-coded magic numbers are a bad practice, including enshrining the encouragement of such bad practices in perlport (and some people who are still convinced that these hard-coded magic numbers are actually a "good practice"). But I've never run into an old Mac Perl and I strongly object to hard-coding such magic numbers so I find "\r" and "\n" much preferable. When writing code for public consumption, I might provide an abstraction that specifically detects old Macs. But the complexity of such an abstraction just isn't justified for me most of the time. - tye	[reply]
Re^3: question about "split" function (magic numbers) by kyle (Abbot) on Dec 12, 2007 at 07:32 UTC
Interesting. The phrase "the values of `\r` and `\n` vary from system to system" came right out of the Camel book. I almost put it in quotes with attribution. It's in a discussion of using `\015\012` especially for socket programming because you'll need to recognize those hard values coming across the network regardless of what your local concept of "newline" is. In this case, the OP refers to data coming from a "phone switch", which I'm guessing is also not likely to change its delimiters if the local concept of "newline" changes from the time the interface code is written to the time it's used. Thanks for pointing out where the magic number rule came from. Given how old Macs are disappearing, I might not have recommended what I did, but I think this case is special enough that the exception might not be all bad.	[reply] [d/l] [select]
Re^4: question about "split" function (Perl 6) by tye (Sage) on Dec 12, 2007 at 08:43 UTC
Re: question about "split" function by moritz (Cardinal) on Dec 11, 2007 at 21:20 UTC
Line feed is `\l`, not `\f`. Update: It's not...	[reply] [d/l] [select]
Re^2: question about "split" function by roboticus (Chancellor) on Dec 11, 2007 at 21:39 UTC
moritz: I always thought \l lowercased, and \n was newline.... ...roboticus	[reply]
Re^2: question about "split" function by kyle (Abbot) on Dec 11, 2007 at 21:47 UTC
On my system, `"\l"` isn't anything. `printf "\\l = %d\n", ord "\l"; printf "\\f = %d\n", ord "\f"; print "\\l: [\l]\n"; __END__ \l = 0 \f = 12 \l: []` [download]	[reply] [d/l] [select]
Re^3: question about "split" function by roboticus (Chancellor) on Dec 11, 2007 at 21:53 UTC
kyle: Try this: `$ perl -e 'print "\lTook \LNow Is The Time\E For All Good Men\n";' took now is the time For All Good Men` [download] ...roboticus Update: I finally found that reference table. It's in perlop. Quick recap: The following escape sequences are available in constructs that interpolate and in transliterations. \t tab (HT, TAB) \n newline (NL) \r return (CR) \f form feed (FF) <<<snip snip snip>>> The following escape sequences are available in constructs that interpolate but not in transliterations. \l lowercase next char \u uppercase next char \L lowercase till \E \U uppercase till \E \E end case modification \Q quote non-word characters till \E If "use locale" is in effect, the case map used by "\l", "\L", "\u" and "\U" is taken from the current locale. <<<snip snip snip>>> [download]	[reply] [d/l] [select]