rkac has asked for the wisdom of the Perl Monks concerning the following question:

I need to optimize some code. Would it be faster to read in the elements of a DBI select all at once in to hash and work with them or is it best to get them one by one through fetchrow_hashref as the code below does? Is there any difference?
#(This section uses String::Similarity) <while ($items = $sth->fetchrow_hashref) { $foundblock = $items->{'Block'}; $founditem=$items->{'itemID'}; $similarity = similarity $foundblock,$block); #checking for duplicate blocks select(undef,undef,undef,0.005); if ($similarity*100 > 85) { #duplicate found $duplicate = 1; last; } }

Replies are listed 'Best First'.
RE: Is a hash faster?
by myocom (Deacon) on Oct 07, 2000 at 03:29 UTC

    Any time I see "Is it faster to do X or Y", I reach for Benchmark.

    Make two subs, one with this routine and the other as you described it, then run 'em both through Benchmark and see for yourself which is faster. Without numbers, it's all speculation.

      Without numbers, it's all speculation.

      That isn't entirely true. Most modern computer and math theory is based on 'formalism' which involves the use of symbols and variables in place of numbers. Why? Because benchmark only tells you the runtime for a given machine with a given set of external variables and a given input. If you trully want to know if something will be faster you must first look at it analytically (sp?) and determine which should be faster. Then explore worst case/best case scenarios.

      And then check your work with a couple benchmarks... but to some extent, you are right... it doesn't hurt to benchmark. I'm just saying its not the complete solution.

Re: Is a hash faster?
by chromatic (Archbishop) on Oct 07, 2000 at 06:06 UTC
    It depends.

    Is your result set large? If so, fetching everything in at once (fetchall_arrayref()) will eat up a lot of memory.

    Do you need to work with all of the results at once? If so, and you can spare the memory, grabbing everything will do.

    If you're just going to loop over the results, you're probably better off doing the while and getting things one at a time. (It might even be quicker to bind output variables with bind_params().)

    I think the gamble is on whether your loop through one big data structre is more expensive than the repeated calls to fetchrow_hashref().

Re: Is a hash faster?
by AgentM (Curate) on Oct 07, 2000 at 05:28 UTC
    In DBI, you proabably want to bet on hash references that DBI likes to return. If this is CGI, then you want to get the info to the user ASAP so you'll fetchrow. (Remember to read in ALL data from the SELECT.)
    AgentM Systems or Nasca Enterprises is not responsible for the comments made by AgentM- anywhere.
Re: Is a hash faster?
by rkac (Novice) on Oct 07, 2000 at 03:47 UTC
    I agree. I was just wondering if it was blantantly obvious if one was faster.