leriksen has asked for the wisdom of the Perl Monks concerning the following question:

I am processing different types of files, some compressed and some uncompressed. The uncompressed ones I can open with just

open(HANDLE, $filename) or die ...

and the compressed ones with

open(HANDLE, "$unzip $unzip_opts $filename |") or die ...

These two different ways of opening a file are in two different sub's - _open_unzipped and _open_zipped

I also have a sub called _detect_type that uses some logic to work out which open routine is required. And it returns a reference to the required sub

sub _detect_type { ... # logic to determine type return \&_open_unzipped if $rule1; return \&_open_zipped if $rule2; return undef; }

All of this is wrapped up in a module Import.pm (shocking name I know ...)

package Import; use Exporter; @ISA = qw(Exporter); @EXPORT = qw(load_file); @EXPORT_OK = qw(_detect_type _open_zipped _open_unzipped ); %EXPORT_TAGS = {STD => \@EXPORT, TEST => \@EXPORT_OK }; sub load_file { my $filename = shift; my $openner = _detect_type($filename); &$openner($filename); while (my $line = <HANDLE>) { ... # do stuff with lines } } sub _detect_type {...} sub _open_zipped {...} sub _open_unzipped {...} 1;

Furthermore, I am a good little vegemite and I have a test harness for all this in t/Import.t

use Test::More qw(no_plan); use Import qw(:TEST); ... is($test1, $result1); is($test2, $result2); ...

Now to my problem - how do I check which code reference I got back from _detect_type.

This doesn't always work - sometimes the code reference matches and sometimes it doesn't.

is(_detect_type('unzipped.txt'), \&open_unzipped); is(_detect_type('zipped.zip'), \&open_zipped);

Is what I am doing unreasonable - or fraught with issues?

I am using Perl 5.6.0 on RH 6.2 Linux.

Replies are listed 'Best First'.
Re: Comparing references to sub's
by pg (Canon) on Mar 20, 2003 at 04:24 UTC
    The other way, which I personally take as a better way, is to make it truely OO. I see an Openner class, and two subclasses of this Openner class: PlainOpenner and ZippedOpener. Both PlainOpenner and ZippedOpenner has their own open method.

    After you detect the type of the file, base on the type, either new a PlainOpenner or a ZippedOpenner. When you call open(), it would be resolved to the appropriate one.

    Seems to me, this gives a better application structure.
      excellent choice, but this is being grafted into an existing app, and with a go-live date 2 weeks away, refactoring the whole thing to OO would not be appreciated by some sections of the project. But if I had my time again, ...
Re: Comparing references to sub's
by graff (Chancellor) on Mar 20, 2003 at 01:40 UTC
    The uncompressed ones I can open with just
    open(HANDLE, $filename) or die ...
    and the compressed ones with
    open(HANDLE, "$unzip $unzip_opts $filename |") or die ...
    These two different ways of opening a file are in two different sub's...

    I don't see why you need (or want) two separate subroutines for opening the file -- the only thing different is the string being passed to open().

    So, based on what gets returned by your "_detect_type()" method, why not just set a scalar to either $filename or "$unzip $unzip_opts $filename |" and pass that scalar to "open()". No need to keep track of alternate sub references (unless you have something else going on in those two different open subs that you haven't told us about...)

    Granted, opening a pipe will return a pid, which you might want, but that can still be accommodated without needing a separate sub for a pipe open vs. a file open.

      true - but the 'real' _open_<type> sub's actually do a lot of other verification and initialisation as well, so that the rest of the code can do its work with the lines from the file. That work is fairly specific to the type of the file - certain types of file are compressed and others are plain - so the verifying and init are specific to the kind of open we do. Thats why I broke them apart into two subs - to keep the type specific code localised. I just didn't show those bits as they weren't specific to the concept I was querying.
        This makes sense if you're dealing with some form of compression other than "gzip", or if your dealing with multiple types of compression (i.e. part of the process is to verify what sort of "$unzip" and "$unzip_opts" are needed).

        If you were just dealing with a distinction between plain files and gzipped files, then you would want to use the Compress::Zlib module, or better yet, the PerlIO::gzip "layer" in Perl 5.8 -- this could simplify things a lot, to the point where you might not need separate subs for opening files and doing line-oriented IO (check this little node for a quickie sample of PerlIO::gzip in 5.8)

        UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

Re: Comparing references to sub's
by leriksen (Curate) on Mar 20, 2003 at 01:36 UTC

    Actually after a lot of playing around I got the references to match - but the question is still valid - is comparing ref's to subroutines valid/appropriate (or any reference for that matter), or am I playing a little to close to Perl internals, (the fact that it works in 5.6 may change in 5.8 or 6.0, etc)?

      Comparing references is a stable and well-defined operation:
      if ( $thisref eq $thatref ) { # these are two refs address the same object, whatever it is }
      (update: as diotalevi has pointed out, the "==" comparison would be more efficient that the "eq", but otherwise the results are equivalent.)

        If you want to be really sure you should use refaddr from Scalar::Util since the reference type could overload any of eq, ==, "" or 0+.

        Obviously not necessary in this case, but not doing it can catch you out (well, it's caught me out before) if you add overloading at a later date.

Re: Comparing references to sub's
by jaa (Friar) on Mar 21, 2003 at 11:39 UTC

    mmmm... so your lower level sub tells you which function to call, but at a higher level you want to know which one it chose, so you can do some other stuff?

    Perhaps it would be better to code it using sub NAMES rather than refs?

    #!/usr/local/bin/perl -w use strict; no strict qw(refs); my $food = $ARGV[0]; $food ||= 'apple'; my $subname = choose($food); print "We got '$subname' - let's get cookin!\n"; &$subname($food); if ( $subname eq 'fruit' ) { print "Puddings up!\n"; } #--------------- sub choose { my $food = shift; return 'fruit' if ( $food eq 'apple' ); return 'vege' if ( $food eq 'carrot' ); return 'unknown'; } sub fruit { my $food = shift; print "I think '$food' is FRUIT!\n"; } sub vege { my $food = shift; print "I think '$food' is VEGE!\n"; } sub unknown { my $food = shift; print "I don't think '$food' is food at all!\n"; }

    Then you are working with sub names, and it is clear in your code when you say thinks like:

    if ( $subname eq 'do_zipped_things' )...

    My 0.02 dollars, regards Jeff

      yes , and my first attempt at this was be using strings and interpolating them as function names - was quite useful during debugging etc. But I wanted to use refs from the 'purity' aspect - I didn't want to use a name that represented a function. I also _think_ it is a 'better way' from the prespective of speed/efficiency (moderately better).

      As for

       if ( $subname eq 'do_zipped_things' )...

      being clearer, isn't

       if ( $ref ==  \&desired_code )...

      just as clear. Refs are even better though, because you are checking that your reference really does 'point' to existing code - the string way, you have a string that 'might' also be the name of a subroutine. How do you check that the string can be 'called'?

      But the really important point, that I think has been missed, is when does equating ref's to subroutines _not_ work e.g what about in anonymous namespaces, or equating methods in objects/classses, what if someone does something weird with gensym(), or plays around with the CODE entry in symbol tables, does the sub have to be exported some how from a module etc, etc