in reply to Re: How to declare a dependency on PerlIO in a CPAN module?
in thread How to declare a dependency on PerlIO in a CPAN module?

Fortunately, most of the code does not rely on being able to use in-memory filehandles, only the feature for parsing "application/warc-fields" from a string. Unfortunately, that feature is how the test suite tests parsing, so I will have to do something similar to your suggestion to skip those tests.

And you are right — better to just try it than to rely on testing $Config{useperlio}, on the off chance that some future perl might support it even without PerlIO. (Both the BSDs and the GNU C Library have similar facilities in their stdio implementations.)

Since I am using tied filehandles as part of my API, the code cannot work on perls older than 5.8, (there are a few versions in the 5.005 and 5.6 series that might work, (when was tie *HANDLE,... introduced anyway?) but IO::Uncompress::Gunzip documents a bug in 5.8.0 that prevents lexical filehandles from being closed properly in some cases, so 5.8.1 is the minimal version) and PerlIO is default in 5.8 and later. I am unsure if my code will ever be run on a perl built without PerlIO, but I want to be sure to produce a reasonable error if PerlIO is actually needed to fulfill a user request and not available.

Replies are listed 'Best First'.
Re^3: How to declare a dependency on PerlIO in a CPAN module?
by haukex (Archbishop) on Sep 02, 2019 at 06:25 UTC
    Fortunately, most of the code does not rely on being able to use in-memory filehandles, only the feature for parsing "application/warc-fields" from a string. Unfortunately, that feature is how the test suite tests parsing

    Here's something that works down to at least 5.6, or just use File::Temp in the first place, or IO::String for that matter:

    sub tempfh { my $data = shift; my $fh; eval { open $fh, "<", \$data or die $!; 1 } or do { require File::Temp; $fh = File::Temp::tempfile(); print $fh $data; seek $fh, 0, 0 or die "seek: $!"; }; return $fh; }

    However, having worked quite heavily with tied filehandles myself, I can say: They're neat, and allow for some cool stuff like my File::Replace::Inplace module, but I strongly recommend against making them a central part of your API unless you have to. There's various issues across various version of Perl - for example, look at the hoops I have to jump through in t/25_tie_handle_argv.t lines 41-49 (and 70-71) just to get the tests working. And even in my module File::Replace, I've come to prefer the API without tied filehandles, even though I initially thought they were pretty neat.

    So from my experience let me suggest: use tied filehandles if they're the only way to provide certain APIs (such as my File::Replace::Inplace, or the transparent uncompression of the IO::Uncompress::* family), but otherwise, make your API OO based (modeling it on IO::Handle, if you like), and provide a tied API only as an optional layer of sugar.

    Update: Improved wording.

      I will probably simply skip the parsing tests if in-memory filehandles are not available. In normal operation, WARC header fields are parsed directly from a filehandle open on the WARC volume, producing the offset to the record data as a side-effect, so in-memory filehandles are a minor feature.

      I am fairly sure that tied filehandles are needed in my API to provide access to records that may be too big to fit into memory (and may be transparently decompressed using IO::Uncompress::Gunzip if the archive uses compression). On the other hand, WARC volumes are read-only, so this is much simpler than File::Replace. Or are you saying that I can get most of the benefits of tied handles by following the IO::Handle interface in the "tied handle" class?

      The majority of the API is OO already, but there are methods that return opened filehandles in the API. For a compressed WARC record, I may be able to simply return the IO::Uncompress::Gunzip handle for ->open_block (which reads the data block in a single WARC record) and for ->open_payload (which reads the actual embedded entity) in some cases (no transfer encoding to strip and no segmentation to reassemble). For an uncompressed WARC record, I have to ensure that the returned handle stops reading at the end of the record. WARC volumes are compressed record-by-record, so stopping at the end of the record is partially solved if IO::Uncompress::Gunzip stops at the end of a compressed block as the documentation suggests that it does. I have not tested this yet.

      Most of your hoops to jump through seem to be related to the <> and *ARGV magic rather than to ordinary tied filehandles passed around as references. Please correct me if I am wrong about this.

        Or are you saying that I can get most of the benefits of tied handles by following the IO::Handle interface in the "tied handle" class?

        I think it depends. Proving a filehandle interface makes sense if these handles will be passed into other APIs that can take only filehandles. OTOH, if all you want to provide your users is a typical open/while(<>) interface, then IMHO it's easier to provide an OO interface instead of something that looks like a filehandle but really isn't, because next thing you know they might try to do things that are actually not supported. Note that in some cases, other APIs don't even require filehandles, they just need objects that support a subset of their methods (like read), so basically duck typing - Perl sometimes has an extremely flexible notion of what a filehandle is, see e.g. Best way to check if something is a file handle?. So I guess what's "best" depends on what API you want to provide to others?

        Most of your hoops to jump through seem to be related to the <> and *ARGV magic rather than to ordinary tied filehandles passed around as references.

        Yes, you're right, I thought I remembered more issues with tied handles themselves, but I don't see them at the moment, sorry!

        Is your code already online somewhere (GitHub?), for context?