js1 has asked for the wisdom of the Perl Monks concerning the following question:

Perl Monks,

I have a routine to update a number of rows for one column in a database table. I want to speed this up though because at the moment it runs an SQL update for each row. For approx 10000 it takes 2 secs.

sub process_urlhits_update{ my $sql1 = qq {UPDATE url_table SET hits}; my $sql2 = qq {WHERE url}; my $sql; foreach my $url ( keys %urlhits_update ){ $sql= qq { $sql1=$urlhits_update{$url} $sql2="$url" }; my $sth = $dbh->prepare( $sql ); $sth->execute(); $urlid_hash{$url}=get_urlid($url) if ! defined $urlid_hash{$ur +l}; delete $urlhits_update{$url}; } }

I tried changing the code to write to a file and then run one SQL command to LOAD FILE. This runs quicker at about 1.3 secs for 10000 rows:

sub process_urlhits_update{ my $sql1 = qq {\nUPDATE url_table SET hits}; my $sql2 = qq {WHERE url}; my $sql; open(FH,">/tmp/updates.txt")||die("could not open updates file"); foreach my $url ( keys %urlhits_update ){ $sql .= qq { $sql1=$urlhits_update{$url} $sql2="$url" }; $urlid_hash{$url}=get_urlid($url) if ! defined $urlid_hash{$ur +l}; delete $urlhits_update{$url}; } print FH $sql; $sql="LOAD DATA LOCAL INFILE '/tmp/updates.txt' INTO TABLE url_tab +le"; my $sth = $dbh->prepare( $sql ); $sth->execute(); }

However, my 2 routines which do bulk SQL inserts only take 0.26 secs.

Is there anything I can do to improve my code above? For example, can I run LOAD FILE without actually writing to disk ? or can you do bulk updates?

Thanks as always for your wisdom.

js.

Replies are listed 'Best First'.
Re: speedy update routine
by Zaxo (Archbishop) on Jun 12, 2004 at 21:25 UTC

    Both routines would be improved by reducing or eliminating global variables, and passing subroutines all the parameters they need. It would help to go through the exercise of converting your script to run under strict.

    The first version would be improved by preparing a statement handle once, using placeholders,

    my $sth = $dbh->prepare( q(UPDATE url_table SET hits=? WHERE url=?)); while (my ($url, $hits) = each %urlhits_update) { $sth->execute($hits, $url); # ... }
    The improvement here is not only in performance, but also in the safety of the DBI quoting mechanism invoked by the placeholders.

    You won't be able to LOAD FILE from an in-memory file you've constructed. That would require a) A db server which can do so, and b) shared memory between the server and your perl process. Neither is likely.

    Is 0.26 seconds really too slow?

    After Compline,
    Zaxo

      Thanks for your reply. I'd forgotten about the placeholders.

      Will getting rid of the global variables and passing parameters actually improve performance or is that just good programming practice?

      Lastly, the update routine takes over a second. The insert routine takes 0.26 secs which I'm happy with.

      js.

        Will getting rid of the global variables and passing parameters actually improve performance or is that just good programming practice?

        Good practice, mainly. It may improve performance, but only indirectly. By reworking your code with attention to variable scope, you may see simplifications which speed things up.

        Don't get too caught up in timing performance. Correctness is far more important.

        After Compline,
        Zaxo

Re: speedy update routine
by exussum0 (Vicar) on Jun 12, 2004 at 23:48 UTC
    I don't knwo what DB you are using, I'm guessing mysql, but things that you ca do are..

    - if you know your data won't violate any indexes during insert, drop the indexes first, insert/update, then reapply your indexes. On small data sets, this is pointless, as the index creation does take time, but on large operations, this is faster.
    - if your order of sql operations won't affect each other, break it up and do it in parallel. there is a limit to how well this works. creating 100k processes to do 1 insert will be dog slow. breaking it into 2 processes may gain you something.
    - transaction + table lock. you guarantee no one is selecting on your tables, thus, you odn't have to worry about row-level locking.

    As I said, don't know what DB you are using, and I only know oracle and postgres really well. But those three (update: was 2) things, especially on bulk operations, will gain you something.

    Just don't do the first on a hot database, as you'll slow down all operations due to other processees doing other sql operations. :)

    Bart: God, Schmod. I want my monkey-man.

Re: speedy update routine
by saberworks (Curate) on Jun 13, 2004 at 20:57 UTC
    You could try using a stored procedure.