Re^4: line ending troubles

Thanks for your answers. But now I have trouble again with slurp. I wanted to find out when I'm using slurp in a list context whether a big file is read at once or not. I just wanted to know if I can use slurp when I'm reading a big file. The following code is not really creating a big file (4 MB). But the slurping takes more than 2 minutes.

#!/usr/bin/perl
use strict;
use warnings;
use Perl6::Slurp;

# write file with different line endings
# \r = 0x0D
# \n = 0x0A
my $win_line = "Windows\r\n";
my $unix_line = "Unix\n";
my $mac_line = "Mac\r";

open(my $fh, ">", "le_big.txt") or die "Failed file open: $!";
binmode($fh);

for (1 .. 100000)
{
    print $fh $win_line;
    print $fh $unix_line;
    print $fh $mac_line;
    print $fh $win_line;
    print $fh $mac_line;
    print $fh $win_line;
    print $fh $unix_line;
}

close($fh);

# read file with slurp

# PerlIO-layer 'crlf' is doing the conversion \r\n --> \n
# i.e. the input record separator only has to handle the line endings 
+\n and \r
# Win32: crlf-layer is activated as default, so it is not necessary
#        to explicitly add this layer
# Unix and other OS: crlf-layer is NOT activated as default
#                    necessary to add this layer

for my $line (slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/, chomp =
+> 1}) )
{
   print  $line . "\n";
}

# NOTE:
# It would also be possible to write a regexp which is working
# if the crlf-layer is active or not:
# irs => qr/\r\n|\n|\r/
# crlf-layer active: possible line endings are \n OR \r
# crlf-layer NOT active: possible line endings are \n\r OR \n OR \r
[download]

Am I doing something wrong or why does the slurping take so much time?

This code was running with Perl 5.10 in a Ubuntu-Linux

Greetings,

Dirk

Comment on Re^4: line ending troubles Download Code

Replies are listed 'Best First'.
Re^5: line ending troubles by kennethk (Abbot) on Dec 28, 2009 at 22:18 UTC
I haven't gone into great detail, but it appears the module incurs high overhead. Specifically, I ran the following benchmarks: #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; use Benchmark qw(cmpthese :hireswallclock); # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; my @strings = (); for (1 .. 1000) { push @strings, $win_line; push @strings, $unix_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $mac_line; push @strings, $win_line; push @strings, $unix_line; } open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); print $fh $_ foreach @strings; close($fh); cmpthese(1000, { 'naive' => \&naive, 'original' => \&original, 'local split' => \&local_split, 'crlf split' => \&crlf_split, }); # Original code sub original { my @results = slurp("<:crlf", "le_big.txt", {irs => qr/\n\|\r/, + chomp => 1}); return @results; } # Just use slurp and crlf to read the file sub crlf_split { my @initial_results = slurp("<:crlf", "le_big.txt"); my @results = map split(/\r/), @initial_results; return @results; } # Just use slurp to read the file sub local_split { my @initial_results = slurp("<", "le_big.txt"); my @results = map split(/\n\|\r\n?/), @initial_results; return @results; } # Naive local implementation sub naive { open(my $fh, "<", "le_big.txt") or die "Failed file open: $!"; local $/; my $slurp = <$fh>; close $fh; my @results = split /\n\|\r\n?/, $slurp; return @results; } [download] With the following results: `time perl fluff.pl Rate original local split crlf split naive original 28.4/s -- -32% -43% -80% local split 41.8/s 47% -- -16% -70% crlf split 49.6/s 75% 19% -- -65% naive 141/s 398% 238% 185% -- real 1m26.512s user 1m26.450s sys 0m0.060s` [download] Run under perl v5.8.8 built for x86_64-linux-gnu-thread-multi, Ubuntu box. Note how much faster the quick-and-dirty slurp and split approach I wrote is. The moral, I think, is that you should only use this module if you have good reason. Note as well that I'm pretty sure the half-way solutions will drop empty lines from the result.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: line ending troubles
by kennethk (Abbot) on Dec 28, 2009 at 22:18 UTC

#!/usr/bin/perl
use strict;
use warnings;
use Perl6::Slurp;
use Benchmark qw(cmpthese :hireswallclock);

# write file with different line endings
# \r = 0x0D
# \n = 0x0A
my $win_line = "Windows\r\n";
my $unix_line = "Unix\n";
my $mac_line = "Mac\r";

my @strings = ();
for (1 .. 1000)
{
        push @strings, $win_line;
        push @strings, $unix_line;
        push @strings, $mac_line;
        push @strings, $win_line;
        push @strings, $mac_line;
        push @strings, $win_line;
        push @strings, $unix_line;
}

open(my $fh, ">", "le_big.txt") or die "Failed file open: $!";
binmode($fh);
print $fh $_ foreach @strings;
close($fh);

cmpthese(1000, {
                                'naive'         => \&naive,
                                'original'      => \&original,
                                'local split'   => \&local_split,
                                'crlf split'    => \&crlf_split,
                        });


# Original code
sub original {
        my @results = slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/,
+ chomp => 1});
        return @results;
}
# Just use slurp and crlf to read the file
sub crlf_split {
        my @initial_results = slurp("<:crlf", "le_big.txt");
        my @results = map split(/\r/), @initial_results;
        return @results;
}
# Just use slurp to read the file
sub local_split {
        my @initial_results = slurp("<", "le_big.txt");
        my @results = map split(/\n|\r\n?/), @initial_results;
        return @results;
}
# Naive local implementation
sub naive {
        open(my $fh, "<", "le_big.txt") or die "Failed file open: $!";
        local $/;
        my $slurp = <$fh>;
        close $fh;
        my @results = split /\n|\r\n?/, $slurp;
        return @results;
}
[download]

With the following results:

 time perl fluff.pl
              Rate    original local split  crlf split       naive
original    28.4/s          --        -32%        -43%        -80%
local split 41.8/s         47%          --        -16%        -70%
crlf split  49.6/s         75%         19%          --        -65%
naive        141/s        398%        238%        185%          --

real    1m26.512s
user    1m26.450s
sys     0m0.060s
[download]

Run under perl v5.8.8 built for x86_64-linux-gnu-thread-multi, Ubuntu box. Note how much faster the quick-and-dirty slurp and split approach I wrote is. The moral, I think, is that you should only use this module if you have good reason. Note as well that I'm pretty sure the half-way solutions will drop empty lines from the result.

[reply]
[d/l]
[select]