in reply to Re^2: How to declare a dependency on PerlIO in a CPAN module?
in thread How to declare a dependency on PerlIO in a CPAN module?

Fortunately, most of the code does not rely on being able to use in-memory filehandles, only the feature for parsing "application/warc-fields" from a string. Unfortunately, that feature is how the test suite tests parsing

Here's something that works down to at least 5.6, or just use File::Temp in the first place, or IO::String for that matter:

sub tempfh { my $data = shift; my $fh; eval { open $fh, "<", \$data or die $!; 1 } or do { require File::Temp; $fh = File::Temp::tempfile(); print $fh $data; seek $fh, 0, 0 or die "seek: $!"; }; return $fh; }

However, having worked quite heavily with tied filehandles myself, I can say: They're neat, and allow for some cool stuff like my File::Replace::Inplace module, but I strongly recommend against making them a central part of your API unless you have to. There's various issues across various version of Perl - for example, look at the hoops I have to jump through in t/25_tie_handle_argv.t lines 41-49 (and 70-71) just to get the tests working. And even in my module File::Replace, I've come to prefer the API without tied filehandles, even though I initially thought they were pretty neat.

So from my experience let me suggest: use tied filehandles if they're the only way to provide certain APIs (such as my File::Replace::Inplace, or the transparent uncompression of the IO::Uncompress::* family), but otherwise, make your API OO based (modeling it on IO::Handle, if you like), and provide a tied API only as an optional layer of sugar.

Update: Improved wording.

Replies are listed 'Best First'.
Re^4: How to declare a dependency on PerlIO in a CPAN module?
by jcb (Parson) on Sep 02, 2019 at 21:32 UTC

    I will probably simply skip the parsing tests if in-memory filehandles are not available. In normal operation, WARC header fields are parsed directly from a filehandle open on the WARC volume, producing the offset to the record data as a side-effect, so in-memory filehandles are a minor feature.

    I am fairly sure that tied filehandles are needed in my API to provide access to records that may be too big to fit into memory (and may be transparently decompressed using IO::Uncompress::Gunzip if the archive uses compression). On the other hand, WARC volumes are read-only, so this is much simpler than File::Replace. Or are you saying that I can get most of the benefits of tied handles by following the IO::Handle interface in the "tied handle" class?

    The majority of the API is OO already, but there are methods that return opened filehandles in the API. For a compressed WARC record, I may be able to simply return the IO::Uncompress::Gunzip handle for ->open_block (which reads the data block in a single WARC record) and for ->open_payload (which reads the actual embedded entity) in some cases (no transfer encoding to strip and no segmentation to reassemble). For an uncompressed WARC record, I have to ensure that the returned handle stops reading at the end of the record. WARC volumes are compressed record-by-record, so stopping at the end of the record is partially solved if IO::Uncompress::Gunzip stops at the end of a compressed block as the documentation suggests that it does. I have not tested this yet.

    Most of your hoops to jump through seem to be related to the <> and *ARGV magic rather than to ordinary tied filehandles passed around as references. Please correct me if I am wrong about this.

      Or are you saying that I can get most of the benefits of tied handles by following the IO::Handle interface in the "tied handle" class?

      I think it depends. Proving a filehandle interface makes sense if these handles will be passed into other APIs that can take only filehandles. OTOH, if all you want to provide your users is a typical open/while(<>) interface, then IMHO it's easier to provide an OO interface instead of something that looks like a filehandle but really isn't, because next thing you know they might try to do things that are actually not supported. Note that in some cases, other APIs don't even require filehandles, they just need objects that support a subset of their methods (like read), so basically duck typing - Perl sometimes has an extremely flexible notion of what a filehandle is, see e.g. Best way to check if something is a file handle?. So I guess what's "best" depends on what API you want to provide to others?

      Most of your hoops to jump through seem to be related to the <> and *ARGV magic rather than to ordinary tied filehandles passed around as references.

      Yes, you're right, I thought I remembered more issues with tied handles themselves, but I don't see them at the moment, sorry!

      Is your code already online somewhere (GitHub?), for context?

        it's easier to provide an OO interface instead of something that looks like a filehandle but really isn't

        As I understand, tied filehandles really are filehandles, so that issue is avoided. The main motivation for using tied values in this API is to reuse Perl APIs instead of inventing new ones.

        This led to the tied aggregate interfaces in WARC::Fields when I realized that I could either reinvent array and hash access with OO methods, or just tie the real thing and "fill out" the ready-made interface form from perltie. That class has very few instance methods as a result, with only one for data access: ->field, which takes a field name (and possibly a new value for that field). Yet complex operations are possible: adding a concurrent record is push @{$record->fields->{WARC_Concurrent_To}}, $other_record->field('WARC-Record-ID'); regardless of how many WARC-Concurrent-To values $record currently has. I am considering adding convenience accessors to WARC::Record for some fields, like ->id for WARC-Record-ID and ->date for WARC-Date (as a WARC::Date object instead of the string that ->field would return).

        Is your code already online somewhere (GitHub?), for context?

        The code that needs tied handles has not been written yet, but I have been making development preview releases on CPAN and collecting smoketest reports. (Amusingly, I have had more failures with the bundled POD test than with the code so far.) Look for JCB/WARC/WARC-v0.0.0_2.tar.gz for the version that prompted this question. It has the WARC::Fields module implemented and some POD describing the planned API so far. I posted a very early draft of that planned API on PerlMonks as Planning a new CPAN module for WARC support (DSLIP: IdpOp).

        Back then, the parse WARC::Fields method was two different methods that have since been merged into a single parse WARC::Fields method. I do not have a problem with distinguishing IO handles and strings because I decided to require slightly different calls: parse WARC::Fields $text vs. parse WARC::Fields from => $filehandle — "from" is not a valid "application/warc-fields" document, so there is no ambiguity. The library itself will always use the from => $filehandle form since it is reading from a WARC volume. The other form internally opens a filehandle on the passed string and reads from that instead of duplicating the entire parser.

        On a side note, do you know of any community Git hosting sites suitable for Perl libraries that do not rely on JavaScript to function and preferably run on Free software? Those are my chief objections to using GitHub for a new project. I have occasionally made pull requests there to contribute to other projects, but I would not want to actually host a project there. Or is this last paragraph itself a good question for the SoPW section here?