in reply to line ending troubles

Yet another way would be to write a custom PerlIO layer (similar in spirit to :crlf, but for reading only), e.g. using PerlIO::via.

In its most simple form it could look something like:

package PerlIO::via::AnyCRLF; # save as PerlIO/via/AnyCRLF.pm sub PUSHED { my ($class) = @_; my $dummy; return bless \$dummy, $class; } sub FILL { my ($self, $fh) = @_; my $len = read $fh, my $buf, 4096; if (defined $buf) { $buf =~ s/\r\n/\n/g; $buf =~ s/\r/\n/g; } return $len > 0 ? $buf : undef; } 1;

Sample usage:

#!/usr/bin/perl use PerlIO::via::AnyCRLF; open my $f, "<:via(AnyCRLF)", "le.txt" or die $!; print while <$f>;

Handling the corner case (when \r\n gets split such that \r is in one buffer read, and \n in the next) is left as an exercise for the reader ;) — A quick fix could be to delegate the \r\n to \n translation to the regular :crlf layer (i.e. "<:crlf:via(AnyCRLF)"), and only do the \r to \n translation in this layer...

Replies are listed 'Best First'.
Re^2: line ending troubles
by Dirk80 (Pilgrim) on Dec 22, 2009 at 22:23 UTC

    Thank you for the hint with the IO layers. Very interesting. Because I never used object oriented programming in perl and knew nothing about layers, I had to read first some stuff to understand it.

    Now I understand your code completely and tried it in my environment. And it is working. Then I tried to implement your suggestion to avoid the corner case by using the crlf layer.

    If I understand you right the solution is as follows:

    package PerlIO::via::AnyCRLF; # save as PerlIO/via/AnyCRLF.pm sub PUSHED { my ($class) = @_; my $dummy; return bless \$dummy, $class; } sub FILL { my ($self, $fh) = @_; my $len = read $fh, my $buf, 4096; if (defined $buf) { $buf =~ s/\r/\n/g; } return $len > 0 ? $buf : undef; } 1;
    #!/usr/bin/perl use strict; use warnings; use PerlIO::via::AnyCRLF; open my $f, "<:crlf:via(AnyCRLF)", "le.txt" or die $!; print while <$f>;

    Greetings,

    Dirk

      If I understand you right the solution is as follows: ...

      Exactly.

      Maybe it's worth pointing out that when you have multiple layers, the order in which they are being applied (which does matter here) is from left to right when reading, and from right to left when writing (which you aren't doing in this case, but good to know anyway :)

      ----- reading ----> external side ":crlf:via(AnyCRLF)" (file) <---- writing -----

        Thank you again for your answer. The IO layers are a great construct. And it is good to know about them.

        But now I want to add the following behaviour to the AnyCRLF module:

        If the layer crlf is not on the stack, I want that the AnyCRLF module automatically puts it on the stack. So the module will always work independent if the user specified the crlf-layer in his open call or not.

        But I have no idea how to achieve this goal. I tried to overwrite the OPEN function as follows:

        sub OPEN { my ($self, $path, $mode, $fh) = @_; print "Path: " . $path . "\n"; print "Mode: " . $mode . "\n"; print "FH: " . $fh . "\n"; open $fh, "<:crlf", $path; }

        My idea was to do an open with the crlf-layer and so to put this layer on the stack. But it does not work. First I only get the path ("le.txt") in the OPEN function. The mode and the fh are undefined.

        Would be very interesting for me how to achieve it that the AnyCRLF module is automatically putting the crlf-layer on the stack if it is not already available.

        Thank you very much

        Dirk