Battling with OOP performance

Evil Attraction has asked for the wisdom of the Perl Monks concerning the following question:

Ok. Some of you might point out that Perl isn't the right choice if you're looking for some hefty performance in your applications, but I still want to solve this problem in Perl, and in the best possible way. Please excuse my English, as it's not my primary language.

First of all, let me tell you about my old way of doing things in my database-powered applications. Let's imagine we're having a Group class and a Person class. The Group class should have a method, get_members(), which should return a list of Person objects;

Person.pm

package Group;
use strict;
use warnings;

use Person;

sub new {
    my $proto = shift;
    my $class = ref( $proto ) || $proto;

    my $self = {
        'group_id' =>  0,
        'name'     => '',

        '_members' => [],
    };
    bless( $self, $class );

    return $self;
}

sub get_members {
    my $self = shift;

    my $members = $dbh->selectcol_arrayref('SELECT person_id FROM grou
+p_member WHERE group_id = ' . $self->id());
    foreach my $person_id ( @{$members} ) {
        push( @members, Person->new($person_id) );
    }

    return \@members;
}

1;
[download]

The code above is quite truncated, but I'll bet you get the idea of how things are working.

The problem, however, is that a call to Person->new() generates a new SQL query (which reads in information about the wanted person record), and generally some extra "class overhead". After some though, I came up with the following solution:

Person.pm

package Person;
use strict;
use warnings;

use _Person;

sub read {
    my $self = shift;
    my $id   = shift || [];

    return [] unless ( $id );

    my @objects = ();

    my $stRead = $dbh->prepare('SELECT person_id, firstname, lastname 
+FROM person WHERE person_id IN (' . join(', ', @{$id}) . ')');
    $stRead->execute();
    while ( my ($person_id, $firstname, $lastname) = $stRead->fetchrow
+() ) {
        my $Person = _Person->new(person_id => $person_id,
                                  firstname => $firstname,
                                  lastname  => $lastname);
        push( @objects, $Person );
    }
    $stRead->finish();

    return ( wantarray ) ? @objects : pop( @objects );
}

1;
[download]

_Person.pm

package _Person;
use strict;
use warnings;

sub new {
    my $proto = shift;
    my $class = ref( $proto ) || $proto;

    my $self = {
        'person_id' =>  0,
        'firstname' => '',
        'lastname'  => '',
    };

    $self->_init( @_ );

    return $self;
}

sub _init {
    my $self = shift;
    my %args = @_;

    return unless ( %args );

    $self->person_id( $args{'person_id'} );
    $self->firstname( $args{'firstname'} );
    $self->lastname( $args{'lastname'} );
}

sub person_id {
    my $self = shift;
    my $data = shift;

    $self->{'person_id'} = $data if ( defined $data );

    return ( defined $self->{'person_id'} && $self->{'person_id'} =~ m
+,^\d+$, ) ? $self->{'person_id'} : 0;
}

sub firstname {
    # Same as person_id(), except the obvious
}

sub lastname {
    # Same as person_id() and firstname(), except the obvious
}

1;
[download]

Now, Group.pm's get_members() method would look something like this:

sub get_members {
    my $self = shift;

    my $members = $dbh->selectcol_arrayref('SELECT person_id FROM grou
+p_member WHERE group_id = ' . $self->id());

    return Person->read( $members );
}
[download]

This way, generating many Person objects requires only one call to the database.

Much of the code is very simplified, but I guess - and hope - you'll get the idea. I've become quite conservative in my OOP coding the last few years, and I have a feeling that the approach above is a bit hairy...?

Any comments, suggestions and general thoughts from those with more Perl wisdom than myself is highly appreciated!

Thanks in advance!

Comment on Battling with OOP performance Select or Download Code

Replies are listed 'Best First'.
Re: Battling with OOP performance by Corion (Patriarch) on Sep 05, 2003 at 11:31 UTC
I'm not sure how it fares performance wise, but Class::DBI does much of what your code does, and allows you to write your code in a more abstract fashion, in that you only declare the columns, and the accessors etc. are created automagically. One thing that might or might not have a positive influence on the performance might be that Class::DBI inherits from Ima::DBI and thus uses prepared statements (and can use placeholders). At least this approach is safer than manually inserting values into queries, as no quoting errors can arise (and values with "'" in them cause no problems either). For the construction of many objects from a query, Class::DBI uses a mechanism very similar to your mechanism, so I doubt that there will be much gain from that side, but it implements connection caching and query caching, two things that you don't - but I don't know if your performance will benefit from that. `perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web` [download]	[reply] [d/l]
Re: Re: Battling with OOP performance by Evil Attraction (Novice) on Sep 05, 2003 at 11:50 UTC
Thanks for your answer. I've already had a look at `Class::DBI`, and it's a great module. It doesn't help much on the performance in my case, though, as I already cache database connections and queries. Due to the complexity of my application I wasn't able to show that in the code I wrote in my original message. I see now that I should have "warned" you about that. :-)	[reply]
Re: Re: Re: Battling with OOP performance by perrin (Chancellor) on Sep 05, 2003 at 15:47 UTC
Sounds like the real performance battle here is with the database. You should reconsider Class::DBI. It has a few solutions to this situation. The first is lazy loading. You can have it query for all of the Persons attached to a Group, and it will create objects that just hold the ID. When you try to access another property on one of these, it does a query to load the other objects. This is good if you query for all of them, but only use the other properties of a few. You could also just list the columns you want as essential columns on Person (i.e. always fetch these when getting a Person from the database) and then set up a "has_many" association from Group which automatically does the join and gets the essential columns all in one query. Finally, if you need something special, you can add a custom SQL query to Person that finds all the information you want in one shot and Class::DBI handles all the work of creating the objects. It would look something like this: `package Person; ... __PACKAGE__->set_sql(by_group => q/ SELECT person_id, firstname, lastname FROM person WHERE group_id = ? /); package Group; ... sub get_members { my $self = shift; Person->search_by_group($self); }` [download]	[reply] [d/l]
Re: Re: Re: Battling with OOP performance by monsieur_champs (Curate) on Sep 05, 2003 at 14:41 UTC
Dear Evil Attraction I've read at the merlyn's "Learning Perl Objects, References & Modules" that OO Perl (as any other OO system or language) gives up on performance in favor of readability and (mainly) code reusability. IMHO, if you're looking for performance improvements, chopping the OO implementation will not be that useful. Maybe you should benchmark your application and determine where are the bottlenecks you need fix. Try to optimize the bottlenecks as the main way to improve performance, and always take into account the performance cost intrinsic to object-oriented implementations. Good luck! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= monsieur_champs	[reply]
Re: Battling with OOP performance by Ovid (Cardinal) on Sep 05, 2003 at 16:14 UTC
Like database normalization, there are times that it's okay to break the rules when the rules conflict with your goals. In this case (though I would probably go with Class::DBI), you have a situation exactly like one that I had a few months ago. I solved it by breaking the rules. `sub new { my $proto = shift; my $class = ref( $proto ) \|\| $proto; my $self = { 'group_id' => 0, 'name' => '', '_members' => [], }; bless( $self, $class ); return $self; } sub get_members { my $self = shift; my $members = $dbh->selectcol_arrayref('SELECT person_id FROM group +_member WHERE group_id = ' . $self->id()); foreach my $person_id ( @{$members} ) { push( @members, Person->new($person_id) ); } return \@members; }` [download] You're making a call to `new()` for every set of data. With a large dataset, this can get expensive. While the following results in duplicated code, it can be much faster (by using this technique I had an identical subroutine that returned instantly versus a several second delay). `sub get_members { my $self = shift; my $class = ref $self; my $group_id = $self->id; my $quoted_group_id = $dbh->quote($group_id); my $members = $dbh->selectcol_arrayref(<<" END_SQL"); SELECT person_id FROM group_member WHERE group_id = $quoted_group_id END_SQL foreach my $person_id ( @{$members} ) { my $person = bless { group_id => $group_id, person_id => $person_id, name => '', _members => [], }, $class; push @members => $person; } return \@members; }` [download] (It's just a rough demonstration of the technique. You'll need to customize it to your needs, if you use it.) Note that we have duplicated the constructor's behavior. Do not do this unless you have a known performance issue. Having duplicated code like this should only be done for very clear cut reasons. It also need documentation to help the programmer find where the functionality is duplicated and understand why it was done lest you have someone's clever refactoring kill your code's performance. Final note, here's how I would rewrite your constructor: `sub new { my $class = shift; my $self = { 'group_id' => 0, 'name' => '', '_members' => [], }; bless( $self, $class ); return $self; }` [download] There is no need for the `ref $proto \|\| $proto` in your constructor. Just leave it out unless you have a very specific reason to do so. It's just clutter. And yes, I know some tutorials bundled with Perl make this mistake :) Cheers, Ovid New address of my CGI Course.	[reply] [d/l] [select]
Re: Battling with OOP performance by bean (Monk) on Sep 06, 2003 at 06:00 UTC
Just make sure your objects deserve to exists as objects. Is the Person class needed for inheritance? Does another class use Persons outside of Groups (and would a group of one be a problem)? You may find that when you start absorbing or duplicating functionality from one class (Person) into another (Group), the original class becomes nothing more than a glorified hash. It's good to keep a sense of perspective (and humor) about objects - my favorite thing about Perl is the "bless" keyword...	[reply]