water has asked for the wisdom of the Perl Monks concerning the following question:

I have big app that uses Class::DBI heavily. I'd like to speed it up. Before profiling, my assumption was the DB was the limiting factor. Below are some profiling results. All the big time eaters are CDBI (which means, I think, the DB itself.)

How should I intrepret these results?

For example, does this tell me that the perl stuff isn't taking any time (assuming the algorithm isn't dumb) and that to gain speedup I should pay attention to the database side, indexes and summary tables and so forth?

I wouldn't gain any speed dropping CDBI and using DBI, would I?

Is there any point optimizing the perl, when all the time seems CDBI (eg DB)?

Should I care it seems to be spending alot of time lowercasing column names   Class::DBI::Column::name_lc?

Thanks for any advice.

water

%Time Sec. #calls sec/call F name 14.67 55.4872 2174052 0.000026 <anon>:...5.8.2/Class/DBI/Colu +mn.pm:37 9.55 36.1474 2178574 0.000017 Class::DBI::Column::name_lc 7.01 26.5282 2627968 0.000010 Class::Accessor::get 6.52 24.6642 220518 0.000112 ? <anon>:..._perl/5.8.2/Class/DB +I.pm:723 5.20 19.6819 264665 0.000074 ? Class::DBI::_init 4.97 18.7836 884369 0.000021 <anon>:...perl/5.8.2/Class/DBI +.pm:1039 4.51 17.0602 661572 0.000026 ? <anon>:...rl/5.8.2/Class/Trigg +er.pm:51 4.41 16.6713 583607 0.000029 Class::DBI::primary_column 3.62 13.6949 661572 0.000021 <anon>:.../DBI/Relationship/Ha +sA.pm:38 3.01 11.3699 583837 0.000019 Class::DBI::ColumnGrouper::pri +mary 2.82 10.6654 266945 0.000040 Class::DBI::_attribute_store 2.59 9.8148 220528 0.000045 Class::DBI::meta_info 2.55 9.6606 220524 0.000044 ? Class::DBI::_simple_bless 2.19 8.2928 361192 0.000023 Class::DBI::_attrs 1.92 7.2692 630109 0.000012 Class::DBI::ColumnGrouper::gro +up_cols 1.39 5.2500 220524 0.000024 <anon>:.../DBI/Relationship/Ha +sA.pm:46 1.36 5.1261 220524 0.000023 <anon>:.../DBI/Relationship/Ha +sA.pm:37 1.35 5.1138 220523 0.000023 <anon>:.../DBI/Relationship/Ha +sA.pm:70 1.25 4.7459 21 0.225996 DBI::st::execute 1.13 4.2896 88172 0.000049 Class::DBI::get 1.08 4.1012 264665 0.000015 <anon>:..._perl/5.8.2/Class/DB +I.pm:518 1.00 3.7662 176564 0.000021 <anon>:..._perl/5.8.2/Class/DB +I.pm:722

Replies are listed 'Best First'.
Re: Using Devel::Profile output to know what to do next
by perrin (Chancellor) on Jun 08, 2005 at 01:50 UTC
    What options did you use to dprofpp? You need to tell it to sort by wall time, not CPU time. Database and other I/O stuff usually takes lots of wall time and very little CPU.
      First time using profiling. Can you say more? All I did was
      perl -d:Profile test.pl
      using Devel::Profile. What would be better?
        If instead of using "Devel::Profile" (which is a non-core CPAN module), you used "Devel::DProf" (which is a core module included with every perl installation), you would be going through an extra step, after running your CDBI app, to run the separate utility "dprofpp" (also included with every perl installation), in order to see the pretty listing of summary statistics for all the sub calls. It's when you run "dprofpp" ("pretty-print") that you can use command line options to specify how to sort the entries.

        Anyway, you say it's a big app with CDBI all through it, but you are considering looking at a straight DBI approach to see if that will speed things up. Looking at your numbers, my initial guess is that going to straight DBI, and managing the layout of your tables with your own data structures (rather than relying on the CDBI OO stuff to do this for you) is likely to cut down noticeably on the overall runtime...

        (... unless of course you happen to do a really poor job of factoring out the CDBI stuff.)

        I've never used CDBI myself, so I don't have a feel for its relative runtime-overhead vs. programmer-efficiency trade-offs. Maybe it's the kind of thing that makes great sense as a prototyping tool, or as a production solution for jobs that don't involve really heavy loads (e.g. having fewer rows and columns involved than you do).

        Still, it's up to you to figure out whether you think a rewrite without CDBI is going to be worthwhile, because we don't know how complicated your app is, or what other constraints there are.

        If you need some empirical evidence before doing a major rewrite, then maybe a worthwhile test to try would be a very simple app that will go through the same amount of database content, with one or more queries that at least come close to what the real app is doing, but doesn't do much else -- it just has to be simple and quick to write both with and without CDBI, so you can benchmark the two approaches.

        (I'll bet that with queries returning 200K rows and 10 columns per row from your database, you'll see a big difference when you take away the 2 million calls to that bunch of CDBI functions.)

        It looks like you picked the wrong profiler. You should use either Devel::DProf or Devel::Profiler, not Devel::Profile. The problem with Devel::Profile is that it measure the CPU time, not the "real" time that has elapsed. Waiting hours for a database query might not take much CPU time at all. If you use a profiler like the two I mentioned which lets you sort by wall ("wall clock" or real time), you will see how much time was really spent waiting for your queries to execute.
Re: Using Devel::Profile output to know what to do next
by hv (Prior) on Jun 08, 2005 at 08:38 UTC

    I've never used Class::DBI, but a quick look shows that the code for Class::DBI::Column is very simple. It might be worth hacking it to cache the lower-case version of the classname to see if that makes a difference:

    use overload '""' => sub { shift->{'name_lc'} }, fallback => 1; [...] sub new { my ($class, $name) = @_; return $class->SUPER::new( { name => $name, name_lc => lc($name), _groups => {}, placeholder => '?' } ); } sub name_lc { shift->{'name_lc'} }

    That should save quite a chunk from the first two lines of your profile output:

    %Time Sec. #calls sec/call F name 14.67 55.4872 2174052 0.000026 <anon>:...5.8.2/Class/DBI/Colu +mn.pm:37 9.55 36.1474 2178574 0.000017 Class::DBI::Column::name_lc

    Be careful though if any other classes are inheriting from CDBI::Column - you'd need to check that this hack is compatible with their expectations. (I'd expect it to be ok though, as long as inheritors call their SUPER::new() to construct.)

    Hugo

      Hang on, cowboy. You need to profile first to see if that method is taking up any real time or not. Your work could be totally pointless if the lowercase calls don't take a significant part of the actual wall time.

        Well whose two routines are taking 91 seconds of CPU time, so unless it's a multi-processor machine or a threaded application, they won't be taking less than 91 seconds of wall clock time.

        The numbers suggest total CPU time around 6.5 minutes; it would be useful to hear from the OP what the actual runtime is, but I'd expect my proposed hack to take more than a minute off it, replacing the 26+17 μs for 2.2 million calls with something more like the 10 μs of the accessor.

        But it is true - I'm used to profiling on a busy server, where useful wall clock times are hard-to-impossible to get, so mostly I concentrate on profiling to optimise CPU requirements, and use a combination of two other techniques for optimising the database side - the ever-popular "finger in the air" and "it hurts when I do this" techniques.

        Hugo

Re: Using Devel::Profile output to know what to do next
by samtregar (Abbot) on Jun 08, 2005 at 23:14 UTC
    If you've got a good feeling that your problem is database related (which you might get from watching top while something slow happens and seeing mysqld pegging the CPU, for example) then you'll get more mileage from DBI's built-in profilers than from something that only looks at the Perl side. Check out DBI::ProfileDumper for the big guns. If your app is simple enough you could also try DBI::Profile.

    -sam

Re: Using Devel::Profile output to know what to do next
by jplindstrom (Monsignor) on Jun 08, 2005 at 14:01 UTC
    I did the same thing you did and found out that unrolling the name_lc call bought some performance gain.

    The adventure is described in my journal.

    /J

Re: Using Devel::Profile output to know what to do next
by TedPride (Priest) on Jun 08, 2005 at 02:49 UTC
    In almost all cases, the database is what needs optimizing, not your script. Remove dead records and optimize your tables often; set up indexes for large, high volume queries; and make sure that you're not using a series of inefficient queries when one query will do. Also, use the smallest field size possible for each piece of data, and stay away from variable length fields unless you really have to use them. If you do use them, always put them at the end of the record, and be sure to optimize your tables even more often to remove dead space.

      Don't guess --profile.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: Using Devel::Profile output to know what to do next
by crusty_collins (Friar) on Jun 08, 2005 at 04:38 UTC
    My two cents.
    I know from experience that ignoring case or changing case can take up a lot of time.
asking the same question again, perhaps more clearly
by water (Deacon) on Jun 09, 2005 at 00:57 UTC
    water here again.

    clearly, the database dominates the time used.

    assuming the algorithm driving the db isn't dumb (say, pulling back rows to sum them in perl would likely be dumb, when instead the db could have done the sum quicker, or, even better, an agg table in the db could have precomputed the sum and cached it),

    then (and here's my question again)

    is it foolish to worry about the speed of the perl side?

    thanks all for the good insights

    water