rimvydazas has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I try to split string having carriage return and feed lines. When I use split(/\r\f/, $x) or split(/<CR><LF>/, $x) it doesn't seem to work. Now, I get the strings from the phone switch. When I print the string, it looks like that:
121107 1547 00316 43kahldsf02 801 2211 808<C +R><LF>15:47 12/11<CR><LF>15:47 12/11<CR><LF>121107 1547 00150 4a +df008 801 6808 <CR><LF>
However, when I use this string and split(/<CR><LF>/, $x), it works just fine, but doesnt when I listen for the same string coming from switch. So, I assume I need to use some other split pattern since <CR><FL> aren't recognized as carriage return and line feed coming from switch. Anyone has any suggestions? Thanks

Replies are listed 'Best First'.
Re: question about "split" function
by roboticus (Chancellor) on Dec 11, 2007 at 21:35 UTC
    If you don't want to be hassled by line endings (Mac / Win / Unix) you might try:

    split /[\r\n]+/, $x;
    So it'll handle a set of carriage returns and newlines no matter which order they appear in. Do note, however, that if you want to preserve information on blank lines, that this won't work. It'll happily gobble up sequential end-of-line sequences...

    ...roboticus

Re: question about "split" function
by kyle (Abbot) on Dec 11, 2007 at 21:46 UTC

    You probably want to use "\015\012" for CR/LF since the values of \r and \n vary from system to system (\f is a form feed).

    my @lines = split /\015\012/, $x;
      You probably want to use "\015\012" for CR/LF since the values of \r and \n vary from system to system

      Bleh. I prefer to not let the mistakes of a single platform sway me away from previous good practices in writing portable code. The only platform where "\r" and "\n" caused any concern was old Macs. There are a lot more non-ASCII platforms where hard-coding magic numbers would cause problems. Granted, there was a period where running into what is now an "old" Mac Perl was much, much more likely than running into a non-ASCII Perl (at least for most people). It is a bit sad, however, that this visibility caused so many people to just overlook the long-standing realization that hard-coded magic numbers are a bad practice, including enshrining the encouragement of such bad practices in perlport (and some people who are still convinced that these hard-coded magic numbers are actually a "good practice").

      But I've never run into an old Mac Perl and I strongly object to hard-coding such magic numbers so I find "\r" and "\n" much preferable. When writing code for public consumption, I might provide an abstraction that specifically detects old Macs. But the complexity of such an abstraction just isn't justified for me most of the time.

      - tye        

        Interesting. The phrase "the values of \r and \n vary from system to system" came right out of the Camel book. I almost put it in quotes with attribution. It's in a discussion of using \015\012 especially for socket programming because you'll need to recognize those hard values coming across the network regardless of what your local concept of "newline" is.

        In this case, the OP refers to data coming from a "phone switch", which I'm guessing is also not likely to change its delimiters if the local concept of "newline" changes from the time the interface code is written to the time it's used.

        Thanks for pointing out where the magic number rule came from. Given how old Macs are disappearing, I might not have recommended what I did, but I think this case is special enough that the exception might not be all bad.

Re: question about "split" function
by moritz (Cardinal) on Dec 11, 2007 at 21:20 UTC
    Line feed is \l, not \f.

    Update: It's not...

      moritz:

      I always thought \l lowercased, and \n was newline....

      ...roboticus

      On my system, "\l" isn't anything.

      printf "\\l = %d\n", ord "\l"; printf "\\f = %d\n", ord "\f"; print "\\l: [\l]\n"; __END__ \l = 0 \f = 12 \l: []
        kyle:

        Try this:

        $ perl -e 'print "\lTook \LNow Is The Time\E For All Good Men\n";' took now is the time For All Good Men
        ...roboticus

        Update: I finally found that reference table. It's in perlop. Quick recap:

        The following escape sequences are available in constructs that interpolate and in transliterations. \t tab (HT, TAB) \n newline (NL) \r return (CR) \f form feed (FF) <<<snip snip snip>>> The following escape sequences are available in constructs that interpolate but not in transliterations. \l lowercase next char \u uppercase next char \L lowercase till \E \U uppercase till \E \E end case modification \Q quote non-word characters till \E If "use locale" is in effect, the case map used by "\l", "\L", "\u" and "\U" is taken from the current locale. <<<snip snip snip>>>