Argel has asked for the wisdom of the Perl Monks concerning the following question:

I'm curious as to why the following does not work. I gather it has something to do with some underlying magic of regular expressions (or perhaps in split)? It just seems odd that I have to do an m/$regex/m to split on LFs and CRs. I would have thought switching to using octal or hex codes would override some of that default behavior.
# Doesn't DWIM $data =~ s/\012/\015/; $data =~ s/\015+/\015/; @records = split /\015/, $data;
My other curiosity would be is there a way to split without having to resort to the map+chomp afterwards (while leaving the rest of the data intact)?
# Works, but the map+chomp seems ugly @records = map {chomp $_; $_} split /^/xms, $data;
Note: I realize only the 'm' option is necessary. The 'xs' are as per Perl Best Practices.

Thank you oh great wise ones!!

-- Argel

Replies are listed 'Best First'.
Re: Special behavior for LF and CR in RegExs?
by Aristotle (Chancellor) on Jan 05, 2006 at 00:55 UTC

    The “doesn’t DWIM” snip seems to be missing /g modifiers. Posting accident, or is that so in your code as well?

    Anyway, if split /^/m works, it seems that split /\n/ also should. Does it not?

    You can minimise that code quite a bit, btw, by simply saying chomp( @records = split /^/xms, $data );

    Makeshifts last the longest.

      Good catch on the missing 'g'!! You are right, that did work.

      I have seen splitting on a \n work and also seen it not work. I'm using a compiled by myself perl 5.8.0 on Solaris 8 so perhaps there is a bug buried away in there?

      Looks like davidrw's $/ suggestion also works. Given the above \n problem I think I will use that instead.

      Thanks for all the help!!

      -- Argel

        Well, $/ is the input record separator; generally, in strings and patterns, \n is magically mapped to that behind the scenes – even if it consists of multiple characters on the platform in question, such as CR/LF on DOS.

        Basically, using \n will always work so long as the data you’re processing comes from the same platform that you’re running on. If not, you’ll need to convert end-of-line markers. There’s no way to avoid this.

        So outside specific scenarios, you should use \n or $/ and let Perl handle the specifics. That will also yield the most portable scripts.

        Makeshifts last the longest.

Re: Special behavior for LF and CR in RegExs?
by davidrw (Prior) on Jan 05, 2006 at 00:52 UTC
    what about just this?
    my @records = split($/, $data);
    Update: Your original code will work if you add the /g modifier to the substitutions..
    perl -le '$_="blah\r\nfoo\r\nstuff\r\n"; s/\012/\015/g; s/\015+/\015/g +; print join ":", split(/\015/,$_)'