in reply to Re: line ending troubles
in thread line ending troubles

Thank you very much for your excellent answer. My fault was that I did not know that the brackets of split have this effect.

But now another question to alternatives and regexps. In my tests I have seen that the order of the alternatives is important. Is it really always true that the first alternative is tried first, then the second, third ... ?

And one more question to slurp. I've seen that when I'm running perl in windows the crlf-layer is active by default. Of course I can pop this layer with binmode or :raw. But if I don't do that. Will slurp call the crlf layer if no layer is specified?

Greetings Dirk

Replies are listed 'Best First'.
Re^3: line ending troubles
by kennethk (Abbot) on Dec 22, 2009 at 23:01 UTC

    Regarding matching behaviors with alternation, see Matching this or that in perlretut. Short answer, yes.

    slurp as implemented in Perl6::Slurp v0.03 (what I'm using for reference), calls a 3-argument open with mode = '<' if no layer information is passed. This means it will behave like a normal file open on your OS, which as you've observed includes the crlf-layer by default under Windows. See Defaults and how to override them in PerlIO.

    If you haven't reviewed it yet, you should read about Newlines in perlport.

      Thanks for your answers. But now I have trouble again with slurp. I wanted to find out when I'm using slurp in a list context whether a big file is read at once or not. I just wanted to know if I can use slurp when I'm reading a big file. The following code is not really creating a big file (4 MB). But the slurping takes more than 2 minutes.

      #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); for (1 .. 100000) { print $fh $win_line; print $fh $unix_line; print $fh $mac_line; print $fh $win_line; print $fh $mac_line; print $fh $win_line; print $fh $unix_line; } close($fh); # read file with slurp # PerlIO-layer 'crlf' is doing the conversion \r\n --> \n # i.e. the input record separator only has to handle the line endings +\n and \r # Win32: crlf-layer is activated as default, so it is not necessary # to explicitly add this layer # Unix and other OS: crlf-layer is NOT activated as default # necessary to add this layer for my $line (slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/, chomp = +> 1}) ) { print $line . "\n"; } # NOTE: # It would also be possible to write a regexp which is working # if the crlf-layer is active or not: # irs => qr/\r\n|\n|\r/ # crlf-layer active: possible line endings are \n OR \r # crlf-layer NOT active: possible line endings are \n\r OR \n OR \r

      Am I doing something wrong or why does the slurping take so much time?

      This code was running with Perl 5.10 in a Ubuntu-Linux

      Greetings,

      Dirk

        I haven't gone into great detail, but it appears the module incurs high overhead. Specifically, I ran the following benchmarks:

        #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; use Benchmark qw(cmpthese :hireswallclock); # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; my @strings = (); for (1 .. 1000) { push @strings, $win_line; push @strings, $unix_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $unix_line; } open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); print $fh $_ foreach @strings; close($fh); cmpthese(1000, { 'naive' => \&naive, 'original' => \&original, 'local split' => \&local_split, 'crlf split' => \&crlf_split, }); # Original code sub original { my @results = slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/, + chomp => 1}); return @results; } # Just use slurp and crlf to read the file sub crlf_split { my @initial_results = slurp("<:crlf", "le_big.txt"); my @results = map split(/\r/), @initial_results; return @results; } # Just use slurp to read the file sub local_split { my @initial_results = slurp("<", "le_big.txt"); my @results = map split(/\n|\r\n?/), @initial_results; return @results; } # Naive local implementation sub naive { open(my $fh, "<", "le_big.txt") or die "Failed file open: $!"; local $/; my $slurp = <$fh>; close $fh; my @results = split /\n|\r\n?/, $slurp; return @results; }

        With the following results:

        time perl fluff.pl Rate original local split crlf split naive original 28.4/s -- -32% -43% -80% local split 41.8/s 47% -- -16% -70% crlf split 49.6/s 75% 19% -- -65% naive 141/s 398% 238% 185% -- real 1m26.512s user 1m26.450s sys 0m0.060s

        Run under perl v5.8.8 built for x86_64-linux-gnu-thread-multi, Ubuntu box. Note how much faster the quick-and-dirty slurp and split approach I wrote is. The moral, I think, is that you should only use this module if you have good reason. Note as well that I'm pretty sure the half-way solutions will drop empty lines from the result.