Re: line ending troubles

The issue is with your regular expression. I'm not feeling particularly like delving into the Perl6::Slurp source right now, but changing your regex from capturing to not yields what I assume to be your intended output:

#!/usr/bin/perl
use strict;
use warnings;

use Perl6::Slurp;

# generate test file "le.txt"
my $win_line = "Windows\r\n";
my $unix_line = "Unix\n";
my $mac_line = "Mac\r";

open(my $fh, ">", "le.txt") or die "Failed file open: $!";
binmode($fh);

print $fh $win_line;
print $fh $unix_line;
print $fh $mac_line;

close($fh);


# read file with slurp

#my @lines = slurp("le.txt", {chomp => 1, irs => qr/(\r\n)|(\n)|(\r)/}
+);
my @lines = slurp("le.txt", {chomp => 1, irs => qr/\r\n|\n|\r/});
for my $line (@lines)
{
    print  $line . "\n";
}
[download]

I assume this means that the module makes use of the capture buffers internally, which your captures overwrite. Also notice I added an or die clause to your open statement, since that's usually a Good Thing(TM). For your pure regular expression solution, is there a reason you didn't just split on a regular expression, a la:

for my $line (split /\r\n?|\n/, $file_content)
{
    print $line . "\n";
}
[download]

Update: I looked at my inbox, and decided I did feel like delving. Your issue is that capturing parentheses mean 'include my delimiter in the result set' (see split, or run the code split /(k)/, 'onektwokthree'). Since Perl6::Slurp uses split to process the results, it inserts your delimeters into the result stream. In fact, because Perl6::Slurp already uses delimiter capturing in the result (I don't see why, but see line 106), you end up with a real mess in the resulting split. The module then drops every other element of the array, which drops some of your results. The initial split results are:

@line = ("Windows", "\n", "", "\n", "", "Unix", "\n", "", "\n", "", "Mac", "\r", "", "\r", "");

~~The module was written by Damian Conway, who is much smarter than I am. Anyone know why he'd use parens in the split and then manually drop alternating terms?~~ He used the delimiter capture to control the chomp behavior.

Comment on Re: line ending troubles Select or Download Code

Replies are listed 'Best First'.
Re^2: line ending troubles by Dirk80 (Pilgrim) on Dec 22, 2009 at 22:08 UTC
Thank you very much for your excellent answer. My fault was that I did not know that the brackets of split have this effect. But now another question to alternatives and regexps. In my tests I have seen that the order of the alternatives is important. Is it really always true that the first alternative is tried first, then the second, third ... ? And one more question to slurp. I've seen that when I'm running perl in windows the crlf-layer is active by default. Of course I can pop this layer with binmode or :raw. But if I don't do that. Will slurp call the crlf layer if no layer is specified? Greetings Dirk	[reply]
Re^3: line ending troubles by kennethk (Abbot) on Dec 22, 2009 at 23:01 UTC
Regarding matching behaviors with alternation, see Matching this or that in perlretut. Short answer, yes. `slurp` as implemented in Perl6::Slurp v0.03 (what I'm using for reference), calls a 3-argument open with mode = '<' if no layer information is passed. This means it will behave like a normal file open on your OS, which as you've observed includes the crlf-layer by default under Windows. See Defaults and how to override them in PerlIO. If you haven't reviewed it yet, you should read about Newlines in perlport.	[reply] [d/l]
Re^4: line ending troubles by Dirk80 (Pilgrim) on Dec 25, 2009 at 00:31 UTC
Thanks for your answers. But now I have trouble again with slurp. I wanted to find out when I'm using slurp in a list context whether a big file is read at once or not. I just wanted to know if I can use slurp when I'm reading a big file. The following code is not really creating a big file (4 MB). But the slurping takes more than 2 minutes. #!/usr/bin/perl use strict; use warnings; use Perl6::Slurp; # write file with different line endings # \r = 0x0D # \n = 0x0A my $win_line = "Windows\r\n"; my $unix_line = "Unix\n"; my $mac_line = "Mac\r"; open(my $fh, ">", "le_big.txt") or die "Failed file open: $!"; binmode($fh); for (1 .. 100000) { print $fh $win_line; print $fh $unix_line; print $fh $mac_line; print $fh $win_line; print $fh $mac_line; print $fh $win_line; print $fh $unix_line; } close($fh); # read file with slurp # PerlIO-layer 'crlf' is doing the conversion \r\n --> \n # i.e. the input record separator only has to handle the line endings +\n and \r # Win32: crlf-layer is activated as default, so it is not necessary # to explicitly add this layer # Unix and other OS: crlf-layer is NOT activated as default # necessary to add this layer for my $line (slurp("<:crlf", "le_big.txt", {irs => qr/\n\|\r/, chomp = +> 1}) ) { print $line . "\n"; } # NOTE: # It would also be possible to write a regexp which is working # if the crlf-layer is active or not: # irs => qr/\r\n\|\n\|\r/ # crlf-layer active: possible line endings are \n OR \r # crlf-layer NOT active: possible line endings are \n\r OR \n OR \r [download] Am I doing something wrong or why does the slurping take so much time? This code was running with Perl 5.10 in a Ubuntu-Linux Greetings, Dirk	[reply] [d/l]
Re^5: line ending troubles by kennethk (Abbot) on Dec 28, 2009 at 22:18 UTC