cool79s has asked for the wisdom of the Perl Monks concerning the following question:

use DBI; use edw; use Benchmark qw(:all) ; my $maximo_connect=$maximoconf{'maximo_connect'}; my $maximo_db_user=$maximoconf{'maximo_db_user'}; my $maximo_pwd=$maximoconf{'maximo_pwd'}; my $edw_connect=$edwconf{'edw_connect'}; my $edw_db_user=$edwconf{'edw_db_user'}; my $edw_pwd=$edwconf{'edw_pwd'}; my $table=$ARGV[0]; my $edw_table="STG_EAM_$ARGV[0]_TB"; my $inserted=0; my $max_commit=30000; my $dbh1 = DBI->connect( $edw_connect, $edw_db_user, $edw_pwd , { Prin +tError => 0, RaiseError => 1,AutoCommit=>0 }); my $dbh = DBI->connect( $maximo_connect, $maximo_db_user, $maximo_pwd +, { PrintError => 0, RaiseError => 1,AutoCommit=>0 }); #$sth = DBI->connect($maximo_connect, $maximo_db_user, $maximo_pwd,{Au +toCommit => 0}) # ->column_info(undef, undef, $table, "%"); $sth=$dbh->column_info(undef, undef, $table, "%"); $sth->execute(); push @fields, $_->{COLUMN_NAME} while $_ = $sth->fetchrow_hashref; my $field_placeholders = join ", ", map {'?'} @fields; my $fieldlist = + join ", ", @fields; my $sth1=$dbh->prepare("SELECT $fieldlist from maximo.$table " ); $sth1->execute(); $t0 = Benchmark->new; my $sql_ins="INSERT INTO $edw_table ( $fieldlist ) VALUES ( $field_placeholders )"; my $sth2=$dbh1->prepare("$sql_ins"); while(@columns=$sth1->fetchrow_array) { $inserted+=$sth2->execute(@columns); unless ($inserted % $max_commit) { $dbh1->do('commit'); $inserted=0; } } $dbh1->do('commit'); $t1 = Benchmark->new; $td = timediff($t1, $t0); print "the code took:",timestr($td),"\n"; $dbh->disconnect(); $dbh1->disconnect(); exit(0);

20101011 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: DBI Performance help
by moritz (Cardinal) on Oct 08, 2010 at 17:15 UTC
    If your real world application consists of inserting the same values all over into your database, think about a better normalization for your schema.

    If the data have a different source, like a text file, consider using the bulk import feature that your database engine most likely provides.

    Also it's a good idea to set RaiseError to 1 or to do some error checking yourself, otherwise you might miss important error messages (that can also tell you about possible performance problems).

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: DBI Performance help
by runrig (Abbot) on Oct 08, 2010 at 17:18 UTC

    Please use <code></code> tags around your code.

    Using fetchrow_arrayref instead of fetchrow_array might be slightly faster, or using bind_columns, but probably not by much. E.g.:

    my $sth = $dbh->prepare("SELECT ..."); my @columns = @{$sth->{NAME}}; my @row; $sth->bind_columns(\@row{0..$#columns}); # Then while ($sth->fetch()) { $ins_h->execute(@columns); }
    But, it would probably be significantly faster if you could use whatever bulk-load facility your database provides.

      Not slightly, it matters a lot. Bench if you want to know.

      See here for a postgresql example


      Enjoy, Have FUN! H.Merijn
Re: DBI Performance help
by locked_user sundialsvc4 (Abbot) on Oct 08, 2010 at 18:11 UTC

    I see that you are using transactions.   Good.   Bulk inserts are often speeded up by disabling indexes on the table for the duration of the update.   When the indexes are re-enabled, they are calculated once for the whole data set.

    I cannot easily see (because of the lack of <code> tags) whether you are preparing the query each time (you don’t need to), and whether you are using placeholders and bound parameters (you should be).

      I have updated the format.Can you please look into it?

        From my perspective, you have worsened the format and removed your question and information about what you are doing.

        See Markup in the Monastery and pay careful attention to how the code tags are used.

Re: DBI Performance help
by ig (Vicar) on Oct 08, 2010 at 18:36 UTC

    Although your program seems quite simple, it is not obvious where the performance bottleneck is and there are many possibilities within your program, the modules you use and the databases you are accessing. A good place to start might be to profile the execution of your program and then focus your attention on whatever takes the most time.

    I have found Devel::NYTProf to be easy to use and very informative.

    I would first try to determine whether most of the time is spent executing your program or waiting for the databases to respond. As your program is quite simple, I suspect the latter.

    You may find that only the selects or only the inserts are slow. Without knowing anything about your databases or systems, it is hard to say anything other than: there are many possibilities. The underlying network or servers may be overloaded. You may have triggers on your database that are consuming time and resources. You may be querying a poorly optimized view.

    Rather than trying to identify all the possibilities in advance, simply start to track down what is taking the time and follow the leads.

    Eventually, you will identify something that is consuming a lot of time. You should be able to identify what is taking the most time quite quickly.

    The next operation step is to try some alternative approach in hope that it will be faster. think about why that operation is slow and what changes you could make that might be make it faster. Then try some of your ideas.

    You are already using Benchmark. Keep good records as you progress so you know what changes you made and what impact they had on performance.

    Be prepared to back out changes. Not everything you try will be an improvement.

    It is not obvious what your underlying databases are. They may have tools that will help you understand why the queries you are running take so long to execute. If they do, you could use them or get help from someone who is familiar with them.

    If you find you are not making good progress, get help from others, providing the information you have gathered and the alternatives you have tried.

Re: DBI Performance help
by clueless newbie (Curate) on Oct 09, 2010 at 15:32 UTC
    For timings set the "Profile" on each db handle:
    $dbh->{Profile} = 6;
    Change:
    while ( @columns = $sth1->fetchrow_array ) { $inserted += $sth2->execute(@columns); ... } }
    to:
    while ($row=shift(@$cache) || shift@{$cache=$sth_o->fetchall_arrayref( +undef,$max_rows) || []}) { $inserted += $sth2->execute(@$row); ... };
    NB: Bunce says that optimal value for $max_row depends on the number and size of columns returned.