Re^2: line ending troubles

Replies are listed 'Best First'.
Re^3: line ending troubles by kennethk (Abbot) on Dec 22, 2009 at 23:01 UTC
Regarding matching behaviors with alternation, see Matching this or that in perlretut. Short answer, yes. `slurp` as implemented in Perl6::Slurp v0.03 (what I'm using for reference), calls a 3-argument open with mode = '<' if no layer information is passed. This means it will behave like a normal file open on your OS, which as you've observed includes the crlf-layer by default under Windows. See Defaults and how to override them in PerlIO. If you haven't reviewed it yet, you should read about Newlines in perlport.	[reply] [d/l]
Re^4: line ending troubles by Dirk80 (Pilgrim) on Dec 25, 2009 at 00:31 UTC
Thanks for your answers. But now I have trouble again with slurp. I wanted to find out when I'm using slurp in a list context whether a big file is read at once or not. I just wanted to know if I can use slurp when I'm reading a big file. The following code is not really creating a big file (4 MB). But the slurping takes more than 2 minutes. #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); for (1 .. 100000) { print $fh $win_line; print $fh $unix_line; print $fh $mac_line; print $fh $win_line; print $fh $mac_line; print $fh $win_line; print $fh $unix_line; } close($fh); # read file with slurp # PerlIO-layer 'crlf' is doing the conversion \r\n --> \n # i.e. the input record separator only has to handle the line endings +\n and \r # Win32: crlf-layer is activated as default, so it is not necessary # to explicitly add this layer # Unix and other OS: crlf-layer is NOT activated as default # necessary to add this layer for my $line (slurp("<:crlf", "le_big.txt", {irs => qr/\n\|\r/, chomp = +> 1}) ) { print $line . "\n"; } # NOTE: # It would also be possible to write a regexp which is working # if the crlf-layer is active or not: # irs => qr/\r\n\|\n\|\r/ # crlf-layer active: possible line endings are \n OR \r # crlf-layer NOT active: possible line endings are \n\r OR \n OR \r [download] Am I doing something wrong or why does the slurping take so much time? This code was running with Perl 5.10 in a Ubuntu-Linux Greetings, Dirk	[reply] [d/l]
Re^5: line ending troubles by kennethk (Abbot) on Dec 28, 2009 at 22:18 UTC
I haven't gone into great detail, but it appears the module incurs high overhead. Specifically, I ran the following benchmarks: #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; use Benchmark qw(cmpthese :hireswallclock); # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; my @strings = (); for (1 .. 1000) { push @strings, $win_line; push @strings, $unix_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $unix_line; } open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); print $fh $_ foreach @strings; close($fh); cmpthese(1000, { 'naive' => \&naive, 'original' => \&original, 'local split' => \&local_split, 'crlf split' => \&crlf_split, }); # Original code sub original { my @results = slurp("<:crlf", "le_big.txt", {irs => qr/\n\|\r/, + chomp => 1}); return @results; } # Just use slurp and crlf to read the file sub crlf_split { my @initial_results = slurp("<:crlf", "le_big.txt"); my @results = map split(/\r/), @initial_results; return @results; } # Just use slurp to read the file sub local_split { my @initial_results = slurp("<", "le_big.txt"); my @results = map split(/\n\|\r\n?/), @initial_results; return @results; } # Naive local implementation sub naive { open(my $fh, "<", "le_big.txt") or die "Failed file open: $!"; local $/; my $slurp = <$fh>; close $fh; my @results = split /\n\|\r\n?/, $slurp; return @results; } [download] With the following results: `time perl fluff.pl Rate original local split crlf split naive original 28.4/s -- -32% -43% -80% local split 41.8/s 47% -- -16% -70% crlf split 49.6/s 75% 19% -- -65% naive 141/s 398% 238% 185% -- real 1m26.512s user 1m26.450s sys 0m0.060s` [download] Run under perl v5.8.8 built for x86_64-linux-gnu-thread-multi, Ubuntu box. Note how much faster the quick-and-dirty slurp and split approach I wrote is. The moral, I think, is that you should only use this module if you have good reason. Note as well that I'm pretty sure the half-way solutions will drop empty lines from the result.	[reply] [d/l] [select]