in reply to Re: RFC:Hacking Tie::File to read complex data
in thread RFC:Hacking Tie::File to read complex data

Hi rhesa, thanks for your comments!

<quote>Thinking one additional method call would ruin performance is, IMHO, misguided</quote>

Yes, I agree with that... until I found some inline methods in Tie::File source code with comments like "inlining read_record() would make this loop five times faster"

I've already done a Benchmark and found that in fact my inherited version in sensible faster than the original Tie::File (I suppose that it is because lines are grouped into records and the indexing, etc... is faster). The benchmark code and results are shown below.

<quote>You break inheritance with the check on the object's class name: with your code, it's now impossible to subclass that particular method and benefit from its features</quote>

Sorry, I don't understand this point

Thanks!

citromatik

Benchmark

Code

use strict; use warnings; use Benchmark; use myTie::File::GFF; use myTie::File; use Tie::File; my $f = shift @ARGV; tie my @arr1, 'myTie::File', $f; tie my @arr2, 'myTie::File::GFF', $f; tie my @arr3, 'Tie::File',$f; Benchmark::cmpthese (100, { 'call_same_module' => sub {my $cont=0; for (@arr1){$cont +++;}}, 'inherited' => sub {my $cont=0; for (@arr2){$cont++;}}, 'orig_tiefile' => sub {my $cont=0; for (@arr3){$cont++ +}} } );

Benchmark result

Rate call_same_module orig_tiefile inhe +rited call_same_module 100.0/s -- -8% + -71% orig_tiefile 109/s 9% -- + -68% inherited 345/s 245% 217% + --

citromatik

Replies are listed 'Best First'.
Re^3: RFC:Hacking Tie::File to read complex data
by rhesa (Vicar) on Jun 15, 2007 at 15:39 UTC
    I found some inline methods in Tie::File source code with comments like "inlining read_record() would make this loop five times faster"

    I noticed those too. At first I thought: "It's telling that Dominus didn't actually do the inlining", and I assumed that he had good reasons for that1. And I imagine you are glad too he didn't do it, or you would have had to override _fill_offsets() as well, copying most of the code. On the other hand, the last update to Tie::File was in 2003, so maybe he just didn't get around to it, and lost interest.

    Sorry, I don't understand this point [about subclassing. rr]
    I'd like to retract that point. I misread your code, and thought you had if( $_caller_pack eq __PACKAGE__ ). You use ne there, which inlines the get_next_rec only for that particular class, so that's perfectly reasonable. Had it been eq then subclasses would have gotten the inline version, and would have been unable to override get_next_rec(). I apologise for the confusion.

    Your benchmark looks impressive, but I can't tell if it's because of your special record reading code, or because of your inlining. Is it really just because of the method call overhead?

    Note 1: one reason being that _read_record() gets called in several places, so inlining it in that one spot would mean code duplication, which is always a maintenance problem.

      <quote>It's telling that Dominus didn't actually do the inlining</quote>

      Yes, you are right... 1 point for Dominus!... but wait!... look at the FETCH method!!:
      sub FETCH { my ($self, $n) = @_; my $rec; # check the defer buffer $rec = $self->{deferred}{$n} if exists $self->{deferred}{$n}; $rec = $self->_fetch($n) unless defined $rec; # inlined _chomp1 substr($rec, - $self->{recseplen}) = "" if defined $rec && $self->{autochomp}; $rec; }
      Ohhh!! Inlining ahead!... he falls in the dark side of the coding force!!:
      # Chomp one record in-place; return modified record sub _chomp1 { my ($self, $rec) = @_; return $rec unless $self->{autochomp}; return unless defined $rec; substr($rec, - $self->{recseplen}) = ""; $rec; }

      :-) Sorry, it is friday!!

      Have a nice weekend!! and thanks for your comments!

      Cheers!

      citromatik