in reply to RFC:Hacking Tie::File to read complex data

Others have already commented on your example data, so I'll limit myself to some thoughts on your "good practice" question.

Your idea to break up a method into several smaller ones, each with a distinct task, is good practice. It encourages subclassing by making it easier to override very specific pieces of behavior. Now, whether you'd be able to get this particular refactoring accepted into Tie::File is another problem altogether, but the idea is sound :-)

So you're definitely on the right track. It's a shame you then cave in to "premature optimization". Thinking one additional method call would ruin performance is, IMHO, misguided. I don't have a benchmark handy, but I wouldn't be surprised if Perl's built-in method dispatching would be just as fast as your caller package check (what with the regular expression and all). Besides, the overhead of one method call will likely be swamped by the IO calls (not to mention the tie interface itself).

The way you think to solve the issue has its problems too:

  1. You break inheritance with the check on the object's class name: with your code, it's now impossible to subclass that particular method and benefit from its features
  2. You now have code duplication: the code in the else() branch belongs in the superclass, not here

Side note: a better way to get the object's class name would be my $pkg = ref $self;. But you should almost never do that: it's usually much better to verify what an object can do than what class it is.

Replies are listed 'Best First'.
Re^2: RFC:Hacking Tie::File to read complex data
by citromatik (Curate) on Jun 15, 2007 at 13:55 UTC

    Hi rhesa, thanks for your comments!

    <quote>Thinking one additional method call would ruin performance is, IMHO, misguided</quote>

    Yes, I agree with that... until I found some inline methods in Tie::File source code with comments like "inlining read_record() would make this loop five times faster"

    I've already done a Benchmark and found that in fact my inherited version in sensible faster than the original Tie::File (I suppose that it is because lines are grouped into records and the indexing, etc... is faster). The benchmark code and results are shown below.

    <quote>You break inheritance with the check on the object's class name: with your code, it's now impossible to subclass that particular method and benefit from its features</quote>

    Sorry, I don't understand this point

    Thanks!

    citromatik

    Benchmark

    citromatik

      I found some inline methods in Tie::File source code with comments like "inlining read_record() would make this loop five times faster"

      I noticed those too. At first I thought: "It's telling that Dominus didn't actually do the inlining", and I assumed that he had good reasons for that1. And I imagine you are glad too he didn't do it, or you would have had to override _fill_offsets() as well, copying most of the code. On the other hand, the last update to Tie::File was in 2003, so maybe he just didn't get around to it, and lost interest.

      Sorry, I don't understand this point [about subclassing. rr]
      I'd like to retract that point. I misread your code, and thought you had if( $_caller_pack eq __PACKAGE__ ). You use ne there, which inlines the get_next_rec only for that particular class, so that's perfectly reasonable. Had it been eq then subclasses would have gotten the inline version, and would have been unable to override get_next_rec(). I apologise for the confusion.

      Your benchmark looks impressive, but I can't tell if it's because of your special record reading code, or because of your inlining. Is it really just because of the method call overhead?

      Note 1: one reason being that _read_record() gets called in several places, so inlining it in that one spot would mean code duplication, which is always a maintenance problem.

        <quote>It's telling that Dominus didn't actually do the inlining</quote>

        Yes, you are right... 1 point for Dominus!... but wait!... look at the FETCH method!!:
        sub FETCH { my ($self, $n) = @_; my $rec; # check the defer buffer $rec = $self->{deferred}{$n} if exists $self->{deferred}{$n}; $rec = $self->_fetch($n) unless defined $rec; # inlined _chomp1 substr($rec, - $self->{recseplen}) = "" if defined $rec && $self->{autochomp}; $rec; }
        Ohhh!! Inlining ahead!... he falls in the dark side of the coding force!!:
        # Chomp one record in-place; return modified record sub _chomp1 { my ($self, $rec) = @_; return $rec unless $self->{autochomp}; return unless defined $rec; substr($rec, - $self->{recseplen}) = ""; $rec; }

        :-) Sorry, it is friday!!

        Have a nice weekend!! and thanks for your comments!

        Cheers!

        citromatik