in reply to Re^3: line ending troubles
in thread line ending troubles

Thanks for your answers. But now I have trouble again with slurp. I wanted to find out when I'm using slurp in a list context whether a big file is read at once or not. I just wanted to know if I can use slurp when I'm reading a big file. The following code is not really creating a big file (4 MB). But the slurping takes more than 2 minutes.

#!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); for (1 .. 100000) { print $fh $win_line; print $fh $unix_line; print $fh $mac_line; print $fh $win_line; print $fh $mac_line; print $fh $win_line; print $fh $unix_line; } close($fh); # read file with slurp # PerlIO-layer 'crlf' is doing the conversion \r\n --> \n # i.e. the input record separator only has to handle the line endings +\n and \r # Win32: crlf-layer is activated as default, so it is not necessary # to explicitly add this layer # Unix and other OS: crlf-layer is NOT activated as default # necessary to add this layer for my $line (slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/, chomp = +> 1}) ) { print $line . "\n"; } # NOTE: # It would also be possible to write a regexp which is working # if the crlf-layer is active or not: # irs => qr/\r\n|\n|\r/ # crlf-layer active: possible line endings are \n OR \r # crlf-layer NOT active: possible line endings are \n\r OR \n OR \r

Am I doing something wrong or why does the slurping take so much time?

This code was running with Perl 5.10 in a Ubuntu-Linux

Greetings,

Dirk

Replies are listed 'Best First'.
Re^5: line ending troubles
by kennethk (Abbot) on Dec 28, 2009 at 22:18 UTC
    I haven't gone into great detail, but it appears the module incurs high overhead. Specifically, I ran the following benchmarks:

    #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; use Benchmark qw(cmpthese :hireswallclock); # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; my @strings = (); for (1 .. 1000) { push @strings, $win_line; push @strings, $unix_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $unix_line; } open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); print $fh $_ foreach @strings; close($fh); cmpthese(1000, { 'naive' => \&naive, 'original' => \&original, 'local split' => \&local_split, 'crlf split' => \&crlf_split, }); # Original code sub original { my @results = slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/, + chomp => 1}); return @results; } # Just use slurp and crlf to read the file sub crlf_split { my @initial_results = slurp("<:crlf", "le_big.txt"); my @results = map split(/\r/), @initial_results; return @results; } # Just use slurp to read the file sub local_split { my @initial_results = slurp("<", "le_big.txt"); my @results = map split(/\n|\r\n?/), @initial_results; return @results; } # Naive local implementation sub naive { open(my $fh, "<", "le_big.txt") or die "Failed file open: $!"; local $/; my $slurp = <$fh>; close $fh; my @results = split /\n|\r\n?/, $slurp; return @results; }

    With the following results:

    time perl fluff.pl Rate original local split crlf split naive original 28.4/s -- -32% -43% -80% local split 41.8/s 47% -- -16% -70% crlf split 49.6/s 75% 19% -- -65% naive 141/s 398% 238% 185% -- real 1m26.512s user 1m26.450s sys 0m0.060s

    Run under perl v5.8.8 built for x86_64-linux-gnu-thread-multi, Ubuntu box. Note how much faster the quick-and-dirty slurp and split approach I wrote is. The moral, I think, is that you should only use this module if you have good reason. Note as well that I'm pretty sure the half-way solutions will drop empty lines from the result.