jdv has asked for the wisdom of the Perl Monks concerning the following question:

I'm observing an odd interaction of my module with Digest::MD5 that I haven't been able to figure out.

I'm implementing a read/write interface to a gzip variant using tied filehandles, with full seek/read/readline/tell support. I'm in the testing phase and nearly all of the tests I'm throwing at it seem to be working (basically performing exactly the same combinations of seek, read, <>, etc, on both my tied filehandle object (with the compressed file loaded) and a regular Perl filehandle opened on the uncompressed version, and comparing the output).

The one exception at this point is when I try to provide my tied filehandle to Digest::MD5's addfile() method. This doesn't work:

my $fh = B2B::BGZF::Reader->new_filehandle( $fn_bgzf ); my $hex = Digest::MD5->new()->addfile($fh)->hexdigest; print $hex, "\n"; # prints d41d8cd98f00b204e9800998ecf8427e

The test returns almost immediately and it appears the hash returned is that of an empty string, so clearly the file is not actually being read. However, this works as expected:

my $fh = B2B::BGZF::Reader->new_filehandle( $fn_bgzf ); my $d = Digest::MD5->new(); my $buf = ''; $d->add($buf) while ( read $fh, $buf, 4096 ); my $hex = $d->hexdigest; print $hex, "\n"; # prints the expected sum

It also works fine if I run the original code but substitute the pure-Perl module (although painfully slowly):

my $fh = B2B::BGZF::Reader->new_filehandle( $fn_bgzf ); my $hex = Digest::Perl::MD5->new()->addfile($fh)->hexdigest; print $hex, "\n"; # prints the expected sum

I additionally tried it with Digest::SHA, and that works fine too:

my $fh = B2B::BGZF::Reader->new_filehandle( $fn_bgzf ); my $hex = Digest::SHA->new(1)->addfile($fh)->hexdigest; print $hex, "\n"; # prints the expected sum

Basically, I have only been able to observe the issue when using my module with the XS implementation of Digest::MD5. Debugging is difficult because I'm not sure what code is actually being called (apparently not the addfile() method of Digest::base or any other actual perl code I can find on my system). I have no problem just using the explicit read()/add() form with Digest::MD5, but if this is an indication of a subtle bug in my code I'd like to work it out - I'm just not sure how to do so.

Any help with understanding what Digest::MD5::addfile() is actually calling under the hood or what might be going on here would be greatly appreciated.

Replies are listed 'Best First'.
Re: Digest::MD5 addfile() w/ tied filehandle
by jdv (Sexton) on Sep 01, 2015 at 21:18 UTC

    Following the suggestion of BrowserUk, I tried to write a bare-minimum example of this behavior. The code is below. The TieTest class is basically meant to remember an internal filehandle and then pass that along with the rest of the arguments through to the core Perl functions.

    There are two interesting/questionable observations. First, with line 19 commented out as it is, Digest::MD5 croaks that it was not passed a filehandle. Maybe I'm doing something wrong here? The only way I have found to get around this is to open the filehandle on something before tie'ing it.

    If I do that (by uncommenting line 19 and feeding the script two different filenames) the pure-Perl implementation acts on the last file opened, while the XS implementation is clearly reading from the first file opened on line 19 and not talking to the tied class (I added a debugging statement to the READ sub just to be sure). This does seem like a bug to me (although saying that out loud pretty much guarantees that I'll find an error in my own code within the next five minutes). Does anyone have any further thoughts on this? Otherwise I'll submit it as a possible bug as suggested.

    Here is the test case

    #!/usr/bin/perl package TieTest; sub TIEHANDLE { return bless {}, shift } sub OPEN { open my $fh, $_[1], $_[2]; $_[0]->{fh} = $fh; } sub READ { warn "...tied read\n"; read $_[0]->{fh}, $_[1], $_[2], + $_[3]; } sub SEEK { seek $_[0]->{fh}, $_[1], $_[2]; } package main; use strict; use warnings; use Digest::MD5; use Digest::Perl::MD5; open FH_PLAIN, '<', $ARGV[0]; #open FH_TIED, '<', $ARGV[1]; # try uncommenting this tie(*FH_TIED, 'TieTest'); open FH_TIED, '<', $ARGV[0]; my %handles = (PLAIN => \*FH_PLAIN, TIED => \*FH_TIED); for my $title (qw/PLAIN TIED/) { my $fh = $handles{$title}; for my $class (qw/Digest::Perl::MD5 Digest::MD5/) { my $sum = $class->new->addfile($fh)->hexdigest; printf "%-7s%-19s%-32s\n", $title, $class, $sum; seek $fh, 0, 0; } }
Re: Digest::MD5 addfile() w/ tied filehandle
by u65 (Chaplain) on Aug 30, 2015 at 13:26 UTC

    Can you provide a complete Perl program, please, shortened as necessary but including pertinent chunks?

Re: Digest::MD5 addfile() w/ tied filehandle
by jdv (Sexton) on Sep 01, 2015 at 16:55 UTC

    Digging into this further, I found that Digest::SHA actually uses straight Perl to do the reading. When tried when Digest::SHA1, which uses PerlIO_read in XS like Digest::MD5, it fails. Perhaps the original question should have been: Can I use tied filehandles with XS modules that use PerlIO_read, and if so, how?

    I found this post: http://stackoverflow.com/questions/13624061/how-do-i-use-tied-filehandles-from-perl-xs-code in which it seems they had to modify the XS. Am I out of luck here? Should I just put a warning in the POD about using it with XS-based modules? Or is there another way to do what I want (a class that can act like a filehandle) without using tied filehandles and which would be compatible with PerlIO_read?

        1. The actual filehandle to the compressed file my class reads is opened ':raw', yes. I'm not sure where else it would make sense to do this. If I try to binmode the tied filehandle, it just tries to find a BINMODE method in my class, which I would have to define myself (probably as setting the mode of the compressed filehandle as I've already done).

        2. I'll try to generate a minimal tied class that demonstrates what I observe. If I see the same thing, does this seem like a bug or a feature request? Would they both go to the same place?

Re: Digest::MD5 addfile() w/ tied filehandle
by jdv (Sexton) on Sep 04, 2015 at 15:16 UTC