http://qs1969.pair.com?node_id=1175122

Pickwick has asked for the wisdom of the Perl Monks concerning the following question:

I have an app which accesses a PostgreSQL database and needs to read some large binary data out of it depending on some needed processing. This might be hundreds of MB or even some GB of data. Please no discussion about using file systems instead or such, it's the way it is now.

That data is simply files of various types, e.g. it might be a Zip container or some other kind of archive. Some of the needed processing is list the contents of the Zip, maybe even extract some members for further processing, maybe hash the stored data... In the end the data is read multiple times, but written only once to store it.

All of the Perl libs I use are able to work with file handles, some with IO::Handle, others with IO::String or IO::Scalar, some others only with low level file handles. So what I've done is create a subclass of IO::Handle and IO::Seekable which acts like a wrapper for the corresponding methods around DBD::Pg. In the CTOR I create a connection to the database, open some provided LOID for reading and store the handle provided by Postgres in the instance. My own handle object is then forwarded to whoever is able to work with such a file handle and can directly read and seek within the blob provided by Postgres.

The problem is libs which use low level file handles or low level file handle operations on IO::Handle. Digest::MD5 seems to be one, Archive::Zip another one. Digest::MD5 croaks and tells me that no handle has been provided, Archive::Zip on the other hand tries to create a new, own handle from mine, calls IO::Handle::fdopen and fails in my case.

sub fdopen { @_ == 3 or croak 'usage: $io->fdopen(FD, MODE)'; my ($io, $fd, $mode) = @_; local(*GLOB); if (ref($fd) && "".$fd =~ /GLOB\(/o) { # It's a glob reference; Alias it as we cannot get name of anon GL +OBs my $n = qualify(*GLOB); *GLOB = *{*$fd}; $fd = $n; } elsif ($fd =~ m#^\d+$#) { # It's an FD number; prefix with "=". $fd = "=$fd"; } open($io, _open_mode_string($mode) . '&' . $fd) ? $io : undef; }

I guess the problem is the low level copy of the handle, which takes my instance of the game.

So, is it even possible in my case to provide some IO::Handle which successfully can be used wherever a low level file handle is expected?

I mean, I don't have a real file handle, I have an object only where method calls are wrapped to their corresponding Postgres methods, for which a database handle is needed and such. All of that data needs to be stored somewhere, the wrapping needs to be done etc.

I tried to do what others are doing, like IO::String, which additionally uses tie for example. But in the end that use case is different, because Perl is able to create a real low level file handle to some internal memory on its own. Something which is not supported at all in my case. I need to keep my instance around, because only that knows of the handle to the database etc.

Using my handle like an IO::Handle by calling method read and such works like expected, but I would like to take it a bit further and be more compatible to whoever doesn't expects to work on IO::Handle objects. Much like IO::String or File::Temp can be used as low level file handles.

package ReadingHandle; use strict; use warnings; use 5.10.1; use base 'IO::Handle', 'IO::Seekable'; use Carp (); sub new { my $invocant = shift || Carp::croak('No invocant given.'); my $db = shift || Carp::croak('No database connection given.' +); my $loid = shift // Carp::croak('No LOID given.'); my $dbHandle = $db->_getHandle(); my $self = $invocant->SUPER::new(); *$self->{'dbHandle'} = $dbHandle; *$self->{'loid'} = $loid; my $loidFd = $dbHandle->pg_lo_open($loid, $dbHandle->{p +g_INV_READ}); *$self->{'loidFd'} = $loidFd; if (!defined($loidFd)) { Carp::croak("The provided LOID couldn't be opened."); } return $self; } sub DESTROY { my $self = shift || Carp::croak('The method needs to be called with +an instance.'); $self->close(); } sub _getDbHandle { my $self = shift || Carp::croak('The method needs to be called with +an instance.'); return *$self->{'dbHandle'}; } sub _getLoid { my $self = shift || Carp::croak('The method needs to be called with +an instance.'); return *$self->{'loid'}; } sub _getLoidFd { my $self = shift || Carp::croak('The method needs to be called with +an instance.'); return *$self->{'loidFd'}; } sub binmode { my $self = shift || Carp::croak('The method needs to be called with +an instance.'); return 1; } sub close { my $self = shift || Carp::croak('The method needs to be called +with an instance.'); my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); return $dbHandle->pg_lo_close($loidFd); } sub opened { my $self = shift || Carp::croak('The method needs to be called wi +th an instance.'); my $loidFd = $self->_getLoidFd(); return defined($loidFd) ? 1 : 0; } sub read { my $self = shift || Carp::croak('The method needs to be called wi +th an instance.'); my $buffer =\shift // Carp::croak('No buffer given.'); my $length = shift // Carp::croak('No amount of bytes to read given +.'); my $offset = shift || 0; if ($offset > 0) { Carp::croak('Using an offset is not supported.'); } my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); return $dbHandle->pg_lo_read($loidFd, $buffer, $length); } sub seek { my $self = shift || Carp::croak('The method needs to be called wi +th an instance.'); my $offset = shift // Carp::croak('No offset given.'); my $whence = shift // Carp::croak('No whence given.'); if ($offset < 0) { Carp::croak('Using a negative offset is not supported.'); } if ($whence != 0) { Carp::croak('Using a whence other than 0 is not supported.'); } my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); my $retVal = $dbHandle->pg_lo_lseek($loidFd, $offset, $whence); $retVal = defined($retVal) ? 1 : 0; return $retVal; } sub tell { my $self = shift || Carp::croak('The method needs to be called +with an instance.'); my $dbHandle = $self->_getDbHandle(); my $loidFd = $self->_getLoidFd(); my $retVal = $dbHandle->pg_lo_lseek($loidFd); $retVal = defined($retVal) ? $retVal : -1; return $retVal; } 1;

Replies are listed 'Best First'.
Re: How to subclass IO::Handle to properly get a low level file handle without having a file or memory?
by dave_the_m (Monsignor) on Nov 02, 2016 at 13:26 UTC

      I doubt that would work for most XS code (or even external libraries that XS thunks to), which seems likely to be part of some of the scenarios mentioned.

      - tye