Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone!

I've a code reading a very big SQL result froma MySQL database:

my $query = $self->{dbh}->prepare("select * from $database.$self->{hit +_table} where time between ? and ? order by visit_id, time", {'mysql_ +use_result' => $self->{optimize}}); $query->execute($start_time + 0, $end_time + 0); while (my $row = $query->fetchrow_hashref()) { # Do calculation }

The problem is with the fetchrow_hashref, it take too long and I've a:
DBD::mysql::st fetchrow_hashref failed: Lost connection to MySQL serve +r during query

the query returns something like 10M of rows from a DB with more than 150M and it takes at least 1 hour.

My question is: Which is the most performance methode to fetch so big results?
The difference between a fetchrow and a fetchall methode?

TIA

Replies are listed 'Best First'.
Re: fetchrow and fetchall question
by matija (Priest) on Mar 17, 2004 at 12:19 UTC
    The fetchrow methods fetch one row (and place it into an array or a hash). The fetchall methods fetch all the selected rows at once, and place them into a an array of arrays, or an array of hashes. For large result sets, you definitely don't want the fetchall methods: you're better off dealing with each row of the results as it comes.

    Regardless of which of those two methods you call some databases (MySQL among them) still gather all the results on their end into an array first, and then feed them back to you. The consequence, as you saw, is a very lengthy query.

    When I ran into this problem, my solution was to find a field in the table I was fetching that had something resembling a flat distribution over it's range of values, and selected the table in chunks based on that key.

    Considering that you're selecting based on the time, I suggest that you make multiple selects with smaller differences between $start_time and $stop_time, (selecting by hours instead of by days, for instance) and then merge the results in your program.

Re: fetchrow and fetchall question
by UnderMine (Friar) on Mar 17, 2004 at 12:33 UTC
    You are probably better off returning a smaller dataset that will return quicker in a fetchall_arrayref, ie. only the primary key, and then fetch slices through that list.
    select ID from hit_table where time between ? and ? order by visit_id, time
    Then loop to fetch 20 rows each time using something like :-
    select * from hit_table where ID in ( ?,?,?,?,?, ?,?,?,?,?, ?,?,?,?,?, ?,?,?,?,?) order by visit_id, time
    Hope it helps
    UnderMine
Re: fetchrow and fetchall question
by jeffa (Bishop) on Mar 17, 2004 at 14:05 UTC
    Take a look at gmax's reply to a similar question that was asked about 4 months ago.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)