thor has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks,

My ongoing attempt at creating a PerlIO::via interface to an old module that I had laying around is actually getting somewhere. However, I've run up against a bit of a stumbling block.

The format that I'm trying to read is not new-line delimited, but is rather a 2-byte exclusive rdw format. What this means is that for every record, there are two bytes that prefix it that says how long that record is. I've written a subroutine that deals with this. When called from the old OO interface, it does as I expect it would: it returns one record. However, when I write a thin wrapper around it and call it as FILL (see PerlIO::via for details), it seems to be hung up looking for a newline character which may or may not be there. This causes the program to read in multiple records which is not what I want it to do. Is there any way to tell perl that "yes...even though there is no newline character at the end, you have the entire record"?

thor

Feel the white light, the light within
Be your own disciple, fan the sparks of will
For all of us waiting, your kingdom will come

Replies are listed 'Best First'.
Re: PerlIO::via FILL subroutine question
by Limbic~Region (Chancellor) on Oct 04, 2004 at 19:22 UTC
    thor,
    I am not sure where you are running into the newline problem. I am betting it is with <>/readline. That is because they define a record by looking at $/ (see perldoc perlvar). PerlIO::via allows you to define your own buffering methods, but readline is still going to return records from the buffer by looking at $/. I don't think you are going to be able to use PerlIO::via to change that behavior. Notice the comment:
    package rdw; use strict; use warnings; sub PUSHED { bless \*PUSHED,$_[0] } sub FILL { my ($length, $record); read $_[1], $length, 1; return undef if eof $_[1]; read $_[1], $record, $length; # uncomment next line to see it is actually working # print "$record\n"; return $record; } sub WRITE { return undef } package main; open( my $in, '<:via(rdw)', 'foo.txt' ) or die $!; while ( <$in> ) { print "$_\n"; } __END__ # foo.txt - output is aababc 1a2ab3abc
    If I guessed correctly, there are other ways to do it - speak up.

    Cheers - L~R

      Yeah, that's the gist of it (the rdw's are packed shorts, but that's a detail that doesn't matter). So, is there some way to implement readline so that I can provide perl with what my notion of a line is? At first, that's what I thought that FILL did, but I guess I was mistaken...

      thor

      Feel the white light, the light within
      Be your own disciple, fan the sparks of will
      For all of us waiting, your kingdom will come

        thor,
        You can look at subclassing IO::File, but that is going to end up giving you a different OO interface. The other option is to tie the filehandle. I was disappointed that Tie::FileHandle::Base didn't do more in the way of giving you default methods to inherit. Here is a very rough proof of concept.
        package rdw; use Carp; sub TIEHANDLE { my ($class, $file) = @_; open ( my $file , '<', $file ) or croak "Unable to open $file : $! +"; return bless \$file , $class; } sub READLINE { my $self = shift; my ($length, $record); read $$self, $length, 1; return undef if eof $$self; read $$self, $record, $length; return $record; } 42; # and a script that uses it #!/usr/bin/perl use strict; use warnings; use rdw; tie *fh, 'rdw', 'foo.rdw' or die "Unable to tie : $!"; while ( <fh> ) { print "$_\n"; } __END__ # foo.rdw - outputs "a\nab\nabc\nabcd\n" as desired 1a2ab3abc4abcd
        See perldoc perltie for more information

        Cheers - L~R