rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm fairly new to MySQL with Perl, a while ago someone showed me how to retrieve and insert and thats all i've done since, but now i've started using 2,000,000+ records and my scripts slow down heaps.

I was wondering if someone could tell me if i'm doing things the right way (in terms of interfacing with the database) or if it is just my scripting that needs work :)

Here's an example of how I do things...
#!/usr/bin/perl use strict; use warnings; use DBI; # Database Definition my $dbh = &DB_OPEN('myDatabase','localhost','3306','user','password'); sub DB_OPEN { my ($db_name,$host_name,$port,$db_user,$db_pass,) = @_; $database = "DBI:mysql:$db_name:$host_name:$port"; $dbh = DBI->connect($database,$db_user,$db_pass); } # end-sub # Do a typical select that retrieves many results... my %DataHash; my $sth = $dbh->prepare(" select ID, Data from table_name where Data=\ +"value\" "); $sth->execute; while ( my ( $ID, $Data) = $sth->fetchrow ) { $CountryDisplay{$ID} = "$Data"; } # end-while $sth->finish; # Do the updating... foreach (keys (%Data)) { $dbh->do(" update new_table set newData=\"$Data{$_}\" where newID=$_ + "); } # end-foreach # Disconnect $dbh->disconnect; exit;

I store the data in a hash before inserting it into the new_table because that seems to work quicker for me.

Any help would be appreciated and if you need any more info (i.e. an actual script) just let me know :)

Cheers,
Reagen

Replies are listed 'Best First'.
Re: How to Speed up MySQL w/ Perl
by davido (Cardinal) on May 26, 2004 at 08:33 UTC
    I was happy to see you preparing and executing separately in your 'select'. I thought, oh good, but then when it really counted (inside a loop) I found you using $dbh->do(...).

    You should be preparing your "updating" statement with placeholders, and then executing within your foreach loop. That's (1) more secure, and (2) faster.

    That's just one "off the top of my head" area where you can improve.


    Dave

      Thanks for your reply Dave.

      But I'm sorry, I dont follow...
      You're talking about this section??
      # Do the updating... foreach (keys (%Data)) { $dbh->do(" update new_table set newData=\"$Data{$_}\" where newID=$_ + "); } # end-foreach

      This is the bit I dont follow...
      preparing your "updating" statement with placeholders

      Cheers,
      Reagen
        Yes, in that section, it should be more like this:

        my $sth = $dbh->prepare("update new_table set newData=? where newID=?) +; foreach ( keys %Data ) { execute( $Data{$_}, $_ ); }

        The idea is that you're preparing the update statement only once, so the database's SQL parser only has to examine it one time. Then as you execute it, the same statement is executed over and over again with the appropriate values inserted via the placeholders.

        That snippet is untested, and just from memory, so do test it and check the docs. There's the DBI quote() method that does proper quoting for you.

        See DBI for more info.


        Dave

Re: How to Speed up MySQL w/ Perl
by davis (Vicar) on May 26, 2004 at 08:39 UTC

    This is untested, but there are comments you may find helpful in the code.

    #!/usr/bin/perl use warnings; use strict; use DBI; my $dbh; #A lexical scope so that the db... variables go out of scope quickly. { my $dbname = "database"; my $dbuser = "user"; my $dbpass = "password"; my $dbhost = "localhost"; my $dbport = 3306; my $dsn = "DBI:mysql:database=$dbname;host=$dbhost;port=$db +port"; $dbh = DBI->connect($dsn, $dbuser, $dbpass, { RaiseError => 1 + }) or die "Couldn't connect to the database: $!\n"; } #Use placeholders here. Your DBD driver will then take care of #appropriate quoting. my $sth = $dbh->prepare("SELECT ID, Data FROM table_name WHERE Data=?" +); $sth->execute("value"); #You're executing the update query lots of times. #Prepare it once, then execute it with different values. my $update_sth = $dbh->prepare("UPDATE new_table SET newData=? WHERE n +ewID=?"); #No point doing the looping over your DataHash twice, you've got the d +ata #once, why not use it here? while(my ($id, $data) = $sth->fetchrow) { $update_sth->execute($data, $id); } #I never bother with these if it's at the end of a script personally. #They'll happen automatically when the variables get DESTROY'ed $sth->finish; $update_sth->finish; $dbh->disconnect;

    davis
    It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.

      # I never bother with these if it's at the end of a script personally. They'll happen automatically when the variables get DESTROY'ed

      Have you ever actually tested those assumptions? Certainly on my system which uses RH 7.3 Perl 5.6.2, and the latest DBI and DBD::msyql with MySQL 4.0.xx if you fail to disconnect your connections will hang around until they either time out (8 hours default) or you run out of connections.

      cheers

      tachyon

        Have you ever actually tested those assumptions?
        No, I thought I'd read it. Good catch tachyon, cheers!
        Sound of davis scrabbling furiously after some production code waving his hands...

        Update: Ah, the bit I've read was about Apache::DBI overloading the disconnect methods. Either way, I was wrong to say what I said, and the finish/disconnect calls ought to be included

        davis
        It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
        Ok them heres a question for you! Why bloody not? Unless I'm missunderstanding something, nobody can use these connections, right? So their hanging around has absolutely no purpose. Is there any point in not having them disconnect at the END?
      Thanks davis.

      That clears up a bit of my previous confusion as well.
      I'll give it a go.

      Cheers,
      Reagen
Re: How to Speed up MySQL w/ Perl
by Zaxo (Archbishop) on May 26, 2004 at 08:58 UTC

    Probably a paste-o from trimming your real code, but you don't declare or populate %Data anywhere before you loop over its keys. Your stricture would have caught that in running code.

    The best opportunity for a speed increase I see is to follow davido's advice and use placeholders. Your UPDATE can be defined once, outside the loop, by saying

    my $usth->'update new_table set newData=? where newID=?'; foreach (keys (%Data)) { $usth->execute( $Data{$_}, $_ ); );
    You may get some speed by changing that loop to while ( my ($id, $data) = each %Data) { $usth->execute( $data, $id);}. That could be if %Data is large.

    You should check for errors each time you go to the database server. You can ease the pain of that by setting RaiseError=>1 in the connect call.

    The biggest source of slowness in DBI is usually just grabbing too much data at once. Refining your demands will help that most.

    After Compline,
    Zaxo

Re: How to Speed up MySQL w/ Perl
by cchampion (Curate) on May 26, 2004 at 11:12 UTC

    If you want to understand what you are trying to do, rather than following blindly some exotic example, there is something for you in Tutorials, about using the DBI at its best:

    HTH

      Thanks cchampion, i've had a look and learned something new already :)
      Cheers,
      Reagen
Re: How to Speed up MySQL w/ Perl
by water (Deacon) on May 26, 2004 at 10:08 UTC
    beyond the perl considerations raised above, spend some time making the mysql side go faster, too:
    (1) Don't grab more data than you need.
    (2) Make sure you have an appropriate index to get that data without scanning too many rows.
    (3) run 'EXPLAIN SELECT' to see how the optimizer is doing your query.
    (4) look at Jeremy Zawodny's new book; it is very good
Re: How to Speed up MySQL w/ Perl
by eric256 (Parson) on May 26, 2004 at 13:25 UTC

    I could be wrong but it looks like you are running a select where the colum 'Data' = 'value', and then looping threw and storing the values of the Data colum.. Won't all the values of 'Data' be 'value' since thats what you queried for?

    my %DataHash; #get all values where the 'Data' colume = 'value'...? my $sth = $dbh->prepare(" select ID, Data from table_name where Data=\ +"value\" "); $sth->execute; while ( my ( $ID, $Data) = $sth->fetchrow ) { $CountryDisplay{$ID} = "$Data"; # storing Data colum } # end-while $sth->finish

    I don't see the point in this whole loop in that case. I dunno if this was just because of the way you made the script smaller or what but I thought i would point it out.


    ___________
    Eric Hodges
      Hey Eric,

      I was just using "value" as a descriptive term for what would be a variable.

      Cheers,
      Reagen