All,
After reading reading 100 line at a one time, I thought that it would be fun to provide a tied filehandle solution. It is an unfinished proof of concept currently, but the basic idea is to allow you to: Here is the module in its current state:
package Tie::File::Custom; use strict; use warnings; use Carp; sub TIEHANDLE { my ($class, $file) = @_; my $self = {}; bless $self, $class; open ($self->{FH}, '<', $file) or croak "Error opening file : $!"; $self->$_() for qw(LINES BUFF NL); return $self; } sub READLINE { my $self = shift; my $fh = $self->{FH}; my $lines; # We already have data in our window if ( $self->{WINDOW} ) { $lines = $self->{WINDOW} =~ s/($self->{NL})/$1/gs; if ( $lines >= $self->{LINES} ) { (my $data, $self->{WINDOW}) = $self->SPLIT(); return $data; } # The window isn't a complete line but we are out of file return delete $self->{WINDOW} if eof $fh; } # We don't have anything and we can't get anything return undef if eof $fh && ! $self->{WINDOW}; # We need to get more into our window local $/ = \$self->{BUFF}; while ( ! eof $fh ) { $self->{WINDOW} .= <$fh>; $lines = $self->{WINDOW} =~ s/($self->{NL})/$1/gs; last if $lines >= $self->{LINES} } # We need to check if we got here because we have enough lines or +ran out of file if ( $lines >= $self->{LINES} ) { (my $data, $self->{WINDOW}) = $self->SPLIT(); return $data; } return delete $self->{WINDOW}; } # We really should be more efficient about this # We should also have lots of tests to fix edge cases sub SPLIT { my $self = shift; my @part = split /($self->{NL})/, $self->{WINDOW}; my $data = join '', @part[ 0 .. ( $self->{LINES} * 2 ) - 1 ]; my $rest = join '', @part[ $self->{LINES} * 2 .. $#part ]; return ($data, $rest); } sub LINES { $_[0]->{LINES} = $_[1] || $_[0]->{LINES} || 1; $_[0]->{ +LINES} } sub BUFF { $_[0]->{BUFF} = $_[1] || $_[0]->{BUFF} || 8192; $_[0]->{ +BUFF} } sub NL { $_[0]->{NL} = $_[1] || $_[0]->{NL} || "\n"; $_[0]->{ +NL} } sub CHOMP { $_[1] =~ s/$_[0]->{NL}$// } "This Statement is false";
Here are a couple example scripts that might use it:
#!/usr/bin/perl use strict; use warnings; use Tie::File::Custom; tie *fh, 'Tie::File::Custom', 'file.txt' or die "Unable to tie : $!"; (tied *fh)->LINES( 2 ); # Return 2 lines at a time (tied *fh)->BUFF( 100 ); # Read 100 bytes at a time since we expect sh +ort lines while ( <fh> ) { print $_, "-\n"; }
#!/usr/bin/perl use strict; use warnings; use Tie::File::Custom; tie *fh, 'Tie::File::Custom', 'num.dat' or die "Unable to tie : $!"; (tied *fh)->NL( "\n\\d+\n" ); # Lines with numbers by themselves are r +ecord terminators while ( <fh> ) { (tied *fh)->CHOMP( $_ ); print "$_\n"; }
Of course, the original intent was to improve performance and it is unlikely this tied implementation does that. On the other hand, this kind of flexibility might be desireable for some.

What are your thoughts on the matter? Personally I don't like the (tied *fh)->method() syntax at all. In general, do people want to see well implemented (which this really isn't) tied filehandle solutions that allows them to do what they want? PerlIO::via is a great module for doing custom IO if you don't need have readline/<> ignore $/ (see perldoc perlvar) and do something else. If there is a need for tied filehandle module that isn't this one, what do people want to see?

Cheers - L~R

Replies are listed 'Best First'.
Re: RFC: Tying Filehandles
by jdporter (Paladin) on Mar 02, 2005 at 23:29 UTC

    Very interesting idea. I decided to have a hack at it myself (code below). However, I stressed different aspects of the problem, so it's not exactly apples and apples, but a benchmark comparison would still be interesting. I predict that you incur significant overhead by having to deal with the regex record delimiter and the "read-ahead" buffer.

    package Tie::Handle::MultiRec; use Carp; sub TIEHANDLE { my $pkg = shift; my $fn = shift; my $irs = shift || "\n"; my $n = shift || 1; my $fh = *main::STDIN; # default $fn and open $fh, "<", $fn or croak "read $fn - $!\n"; bless { readline => sub { local $/ = $irs; my $s; for ( 1 .. $n ) { eof($fh) and return $s; $s .= <$fh>; } $s }, n => sub { @_ and $n = shift; $n }, print => sub { for (@_) { /^N\w*=(\d+)/mi and $n = $1; /^I\w*="(.*?)"/msi and $irs = $1; } }, }, $pkg; } sub READLINE { $_[0]{'readline'}->() } sub N { $_[0]{'n'}->($_[1]) } sub PRINT { my $self = shift; $self->{'print'}->(@_); }
    Some features of this code:
    • It will read from stdin if you supply a false filename.
    • the methods are implemented as closures inside the new sub; this keeps me from having to continually assign between members of the object and lexical vars.
    • You can set parameters of the object after initialization by calling the print method of the handle, as well as the usual methods of the tied object. I figured, this handle can't really do output, so use print as an out-of-band channel.
    Example:
    my $ctl = tie *FH, 'Tie::Handle::MultiRec', $0, "\n", 3; my $n = 1; while (<FH>) { print "Record $n:\n$_\n\n"; $ctl->N(3+$n); # get more lines per read (the conventional way) print FH qq(irs=""); # switch to paragraph mode (using OOB) $n++; }
Re: RFC: Tying Filehandles
by Zaxo (Archbishop) on Mar 03, 2005 at 06:32 UTC

    Another possibility is to override local *CORE::GLOBAL::readline.

    I'm not sure that the tied handle interface is the way to manage specialized data handling at the user's end. It seems more suited to hooking up a very different data structure to an I/O-centric application. Something like fetching DBI statement handle results as if they were lines from a CSV file.

    It's no secret that I've slowly become a believer in tie. As I said somewhere else, tied interfaces are slower because they are doing something. Presumably, it should be something worth doing - something that would otherwise have to be done explicitly, in multiple places.

    As a matter of taste, I'm not sure that should encompass new conventions which modify standard perl semantics. We're accustomed to the diamond op returning a single $/-delimited line in scalar context, and the list of all lines in array context. To produce a tied handle interface that alters those semantics seems backwards to me.

    In Tie::Constrained, I found that tie provided a wedge into assignment that is unavailable with any other Perl construct. I notice that there is a similar opportunity in tied handles. A "file" handle can be produced which fails with a preset system error, on a particular kind of operation. That could be valuable for testing. Imagine being able to test for correct error handling when an operation on STDERR returns EPERM or ENOSPC.

    It's a well-known wart on Perl that global print is not overridable (that's why Fatal doesn't work for print). With a tied interface, we have the PRINT method. That is another unique wedge into an otherwise immutable bit of Perl.

    Whether I like this particular tied handle class or not, this is a very interesting topic. I'm glad you brought it up.

    After Compline,
    Zaxo

Re: RFC: Tying Filehandles
by Mugatu (Monk) on Mar 02, 2005 at 19:24 UTC
    tie returns the tied object, so you can make your ugly syntax a bit less ugly:
    my $ctrl = tie *fh, 'Tie::File::Custom', 'num.dat' or die "Unable to tie : $!"; $ctrl->LINES( 2 ); # Return 2 lines at a time $ctrl->BUFF( 100 ); # Read 100 bytes at a time since we expect sh
      Mugatu,
      Right. I offer both approaches in Tie::Hash::Sorted as well as passing the arguments in the constructor. As you can see, I don't much care for the alternative either. Thanks though!

      Cheers - L~R