in reply to line ending troubles

The issue is with your regular expression. I'm not feeling particularly like delving into the Perl6::Slurp source right now, but changing your regex from capturing to not yields what I assume to be your intended output:

#!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; # generate test file "le.txt" my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; open(my $fh, ">", "le.txt") or die "Failed file open: $!"; binmode($fh); print $fh $win_line; print $fh $unix_line; print $fh $mac_line; close($fh); # read file with slurp #my @lines = slurp("le.txt", {chomp => 1, irs => qr/(\r\n)|(\n)|(\r)/} +); my @lines = slurp("le.txt", {chomp => 1, irs => qr/\r\n|\n|\r/}); for my $line (@lines) { print $line . "\n"; }

I assume this means that the module makes use of the capture buffers internally, which your captures overwrite. Also notice I added an or die clause to your open statement, since that's usually a Good Thing(TM). For your pure regular expression solution, is there a reason you didn't just split on a regular expression, a la:

for my $line (split /\r\n?|\n/, $file_content) { print $line . "\n"; }

Update: I looked at my inbox, and decided I did feel like delving. Your issue is that capturing parentheses mean 'include my delimiter in the result set' (see split, or run the code split /(k)/, 'onektwokthree'). Since Perl6::Slurp uses split to process the results, it inserts your delimeters into the result stream. In fact, because Perl6::Slurp already uses delimiter capturing in the result (I don't see why, but see line 106), you end up with a real mess in the resulting split. The module then drops every other element of the array, which drops some of your results. The initial split results are:

@line = ("Windows", "\n", "", "\n", "", "Unix", "\n", "", "\n", "", "Mac", "\r", "", "\r", "");

The module was written by Damian Conway, who is much smarter than I am. Anyone know why he'd use parens in the split and then manually drop alternating terms? He used the delimiter capture to control the chomp behavior.

Replies are listed 'Best First'.
Re^2: line ending troubles
by Dirk80 (Pilgrim) on Dec 22, 2009 at 22:08 UTC

    Thank you very much for your excellent answer. My fault was that I did not know that the brackets of split have this effect.

    But now another question to alternatives and regexps. In my tests I have seen that the order of the alternatives is important. Is it really always true that the first alternative is tried first, then the second, third ... ?

    And one more question to slurp. I've seen that when I'm running perl in windows the crlf-layer is active by default. Of course I can pop this layer with binmode or :raw. But if I don't do that. Will slurp call the crlf layer if no layer is specified?

    Greetings Dirk

      Regarding matching behaviors with alternation, see Matching this or that in perlretut. Short answer, yes.

      slurp as implemented in Perl6::Slurp v0.03 (what I'm using for reference), calls a 3-argument open with mode = '<' if no layer information is passed. This means it will behave like a normal file open on your OS, which as you've observed includes the crlf-layer by default under Windows. See Defaults and how to override them in PerlIO.

      If you haven't reviewed it yet, you should read about Newlines in perlport.

        Thanks for your answers. But now I have trouble again with slurp. I wanted to find out when I'm using slurp in a list context whether a big file is read at once or not. I just wanted to know if I can use slurp when I'm reading a big file. The following code is not really creating a big file (4 MB). But the slurping takes more than 2 minutes.

        #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); for (1 .. 100000) { print $fh $win_line; print $fh $unix_line; print $fh $mac_line; print $fh $win_line; print $fh $mac_line; print $fh $win_line; print $fh $unix_line; } close($fh); # read file with slurp # PerlIO-layer 'crlf' is doing the conversion \r\n --> \n # i.e. the input record separator only has to handle the line endings +\n and \r # Win32: crlf-layer is activated as default, so it is not necessary # to explicitly add this layer # Unix and other OS: crlf-layer is NOT activated as default # necessary to add this layer for my $line (slurp("<:crlf", "le_big.txt", {irs => qr/\n|\r/, chomp = +> 1}) ) { print $line . "\n"; } # NOTE: # It would also be possible to write a regexp which is working # if the crlf-layer is active or not: # irs => qr/\r\n|\n|\r/ # crlf-layer active: possible line endings are \n OR \r # crlf-layer NOT active: possible line endings are \n\r OR \n OR \r

        Am I doing something wrong or why does the slurping take so much time?

        This code was running with Perl 5.10 in a Ubuntu-Linux

        Greetings,

        Dirk