eff_i_g has asked for the wisdom of the Perl Monks concerning the following question:

Bodhisattvas,

Here is my current (and working) OOP version of Versatile subs. As this is my first time dabbling with OOP, I'm looking for feedback.

Thank you.

Module:
#!/usr/bin/perl package Hasher; use Carp; use strict; use warnings; sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; bless ($self, $class); return $self; } sub file { my $self = shift; if (@_) { $self->{file} = shift; } } sub _common { my $self = shift; (my $key = (caller(1))[3]) =~ s/.*:://; if (@_) { $self->{$key} = shift; } else { $self->{$key}->($self) if defined $self->{$key}; } } sub pre { my $self = shift; $self->_common(@_); } sub loop { my $self = shift; $self->_common(@_); } sub post { my $self = shift; $self->_common(@_); } sub hash { my $self = shift; croak 'file does not exist' if not -e $self->{file}; $self->{count} = 0; $self->{hash} = (); $self->pre(); open my $fh, '<', $self->{file} or croak "could not open $self +->{file}: $!"; while (my $record = <$fh>) { chomp $record; $self->{record} = $record; $self->loop(); $self->{hash}{ $self->{key} } = $self->{value}; ++$self->{count}; } $self->post(); return %{$self->{hash}}; } 1;

script:
#!/usr/bin/perl use strict; use warnings; use Hasher; my $states = new Hasher(); $states->file('states'); $states->loop( sub { my $self = shift; ($self->{value}, $self->{key}) = split ' ', $self->{re +cord}, 2; $self->{key} = uc $self->{key}; } ); $states->post( sub { my $self = shift; print "hashed $self->{count} states\n"; } ); my %state = $states->hash(); map { print "$_ = $state{$_}\n"; } keys %state;

Replies are listed 'Best First'.
Re: OOP first-timer seeks feedback; likes long walks on the beach.
by Tanktalus (Canon) on Sep 08, 2005 at 23:51 UTC

    sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; bless ($self, $class); return $self; }
    Someone's going to complain about the constructor - but if you never call it in the disapproved manner, it doesn't really matter.
    sub file { my $self = shift; if (@_) { $self->{file} = shift; } }
    For consistancy, there should be an "else { $self->{file} }" there or something to allow you to use it as an accessor. That way, rather than ever saying "$self->{file}" (other than in this method), you say "$self->file()". This allows you to change the underlying data structure merely by changing the accessors if need be.
    sub _common { my $self = shift; (my $key = (caller(1))[3]) =~ s/.*:://; if (@_) { $self->{$key} = shift; } else { $self->{$key}->($self) if defined $self->{$key}; } }
    Interesting use of caller. Personally, I'd just pass the mode in:
    sub _common { my $self = shift; my $key = shift; # ...
    and then pre/loop/post would simply look like:
    sub pre { my $self = shift; $self->_common(pre => @_); } # etc.
    As to the meat of your algorithmic module:
    $self->pre(); open my $fh, '<', $self->{file} or croak "could not open $self +->{file}: $!"; while (my $record = <$fh>) { chomp $record; $self->{record} = $record; $self->loop(); $self->{hash}{ $self->{key} } = $self->{value}; ++$self->{count}; } $self->post();
    I'm not sure this is necessarily a good thing. You are forcing a few things. First, all communication is done via $self. This is only natural inside a single object, however your callbacks are all external to your object, so that doesn't make a bunch of sense to me. Second, you're limiting each loop to a single key/value. Some items in the loop may result in no data to be stored, while other items may be split into multiple keys/values. So something like this may be in order:
    my %data = $self->loop(); @{$self->{hash}{keys %data}} = values %data; $count += scalar keys %data;
    (that scalar may be redundant, but I like being explicit.) Besides that, because the data is being pass in implicitly, it's not really all that obvious what is going on. Passing in data via loop() isn't an option because then _common will think you're setting something. Global variables are worse. So maybe _common doesn't run the code automatically, or maybe it does, but doesn't do setting at all.

    Also remember that the coderefs you're getting are generally going to be closures anyway. They're likely to be setting data in their own object, not yours. So maybe we don't need to return anything at all, just give the record to the callback, and ignore the return. And ignore the counting. Then we can put the record into $_ (localised properly, of course), and not pass anything in.

    My two cents (CDN) :-)

      Tanktalus:

      Thanks for taking the time to type all of that! :D

      What do you mean by if you never call it in the disapproved manner?

      I did the caller thing to be even more lazy. Do you know if caller is slower in any fashion? ...or is using the $key route simply better self-documentation?

      This is only natural inside a single object, however your callbacks are all external to your object, so that doesn't make a bunch of sense to me.
      Callbacks meaning the anonymous subs (coderefs, closures)? I'm a little confused here. I initially put everything into self so pre, loop, and post could dynamically communicate, i.e., if pre has a my $var = 'something';, loop cannot see it unless it is in self, correct? Is there a better way? To further the explanation: Another file may want to check the existence of something in the hash, like so:

      pre: my $exists = 0;

      post: # processing using $exists, changing it, logging it

        The problem with the constructor is the ref $proto || $proto line. What's that for? It allows you to do this:

        my $object = Class->new; my $object2 = $object->new;

        The problem is, unless you document why you need that, it's very unclear. Is $object2 a brand-new, clean object with no relation to the first object? If so, why are you calling new as an instance method?

        Some people might think it's supposed to clone the first object. If so, is it a shallow or deep copy? Is there a reason in the code to clone the first object?

        If it creates a brand new object, either only allow new as a class method or, if you don't have the class handy (maybe it's generated from a factory), then document it like this:

        my $object2 = (ref $object)->new;

        If it really is a clone, create a clone method and avoid the confusion.

        Since few people can agree on the semantics of $object->new, it's a source of confusion. That being said, use it if you need it, but make it clear why you're doing this.

        In fact, this has been a source of enough confusion and argument that I removed most references to this in the Perl docs.

        Cheers,
        Ovid

        New address of my CGI Course.

        The disapproved manner ... check this thread A few Perl OOP questions. for more info. I'll leave it at that. ;-)

        The caller thing - I'm not sure about speed. But I don't see caller used very often, so it's probably not well understood. That said, I merely mentioned the way I'd normally approach the problem and did not mean to imply that yours was inferior. There's "clever" and "too clever". Where the line is drawn, however, depends on one's experience ;-)

        Here's what I think a reasonable usage of this paradigm would look like from the caller's perspective. Note that this is not intended to be the only reasonable usage, just an example of one reasonable usage:

        # here's the pre portion... my $count; my %data; IterateOver::file({ loop => sub { if (/^(.*?)\s*=\s*(.*?)\s*(?:#.*)$/) { $data{$1} = $2; } }, $some_file ); # post goes here...
        The only purpose of having a pre and a post is when you have loops inside of loops - so the pre and post are executed prior to the loop commencing, and after the loop is completed, respectively.

        Thus, the example:

        my @files = ( ... ); my $count; my %data; IterateOver::files({ pre => sub { print "Starting on file $_.\n"; $count = 0; }, loop => sub { if (/^[^#]/ and /^(.*?)\s*=\s*(.*?)\s*(?:#.*)$/) { $data{$1} = $2; ++$count; } }, post => sub { print "Found $count keys in $_.\n"; print "Total unique keys so far: ", scalar keys %data, "\n"; }, @files );
        Of course, this is completely not an object. To be honest, right-tool-for-right-job does not tell me that OO is the best way to do this in perl (and I'm an OO bigot ... at least from my C++ days!). Closures are way easier to use for this.

        Don't get me wrong, I've done this in an object in perl. But that's because I was part of a framework, and my framework was already object oriented (oddly enough, this is the third time today I've talked about TaskManagers and Tasks). I have a package that does iteration over some common data, and then calls object methods (not passed in but predefined) as hooks into various parts of the loop - including pre-filter, filter, and post-filter. There's no need to pass in callbacks because the loop function is inside another module dedicated to this task. It looks something like this (see above for the layout of Task):

        To use this, you derive from TaskFilter your own package. And then you have a $self to use - and that $self really is your filter object. As in your filter object. Thus it makes perfect sense to modify object variables inside object methods, so you can do whatever you want with it - update %$self, etc.

        Does that help?