pysome has asked for the wisdom of the Perl Monks concerning the following question:

%hash; # It stores about 70,000 records,just like $hash{"ID"} = "2002" +,etc;
Method 1:
my $dbh=DBI->connect("xxxx"); # My DB is SQLSERVER 2005 foreach (keys %hash) { my $query = "UPDATE tab SET d='$hash{$_}' WHERE id=$_"; die "Error $query\n" unless $dbh->do($query); } $dbh->disconnect;
Method 2:
my $dbh=DBI->connect("xxxx"); # My DB is SQLSERVER 2005 my $sth=$dbh->prepare(qq{ UPDATE tab SET d='?' WHERE id=?}); foreach (keys %hash) { $sth->execute($hash{$_},$_); } $sth->finish; $dbh->disconnect;
I use the Method one,But it spent about 2 hours .I can't beared it:(

Replies are listed 'Best First'.
Re: which method is more faster?
by ikegami (Patriarch) on Jul 25, 2007 at 02:54 UTC

    Method 2 is not only faster (since you only prepare one statement), it's safer (since it handles quoting).

Re: which method is more faster?
by archfool (Monk) on Jul 25, 2007 at 02:46 UTC
    Preparing a query that is used multiple times is almost ALWAYS faster and more efficient than ->do statements. The database has to parse and preprocess EVERY DO statement. Prepare gets the parsing and preprocessing done for you, and spends more time on work than overhead.

    BTW... in a prepared statement, d='?' isn't needed. you should use d=? (Binding variables SHOULD quote it for you).

      My understanding is that Perl extracts the hash keys into a list in-memory when you use the keys operator (function?).

      When I know I'll be working with large hashes like this, I tend to use each, so the key/value pairs are fetched from the hash one-at-a-time.

      If you're using lots of memory, give this a try:
      my $dbh=DBI->connect("xxxx"); # My DB is SQLSERVER 2005 my $sth=$dbh->prepare(qq{ UPDATE tab SET d=? WHERE id=?}); for (my ($key,$val) = each %hash) { $sth->execute($hash{$key},$val); } $sth->finish; $dbh->disconnect;
      just to extend it a bit, using "bind variables" will not guarantee that the database will not preprocess it, but it'll do it often. the optimizer will optimize every query if there is not a cached optimization, and cacheing is performed on the query string itself, so bind variables allow you to have the same statement string (same optimization then) for every different "variable value". Note: a backdraw is that optimization over a bind variable could not be always efficient as using the real value (think about range-scan) but usually it's never encouraged 'cauz the gain is very low, and will cache-out something that could be usefull. Oha
Re: which method is more faster?
by atemon (Chaplain) on Jul 25, 2007 at 02:57 UTC

    The languages like Perl, Java etc supports two way to execute the SQL. You have given both methods in perl. In method 1 (Non prepared statement) every time $dbh->do($SQL) is called, the sql will be prepared and executed. So in your case its prepared and executed about 70,000 times. In case of Method 2, the SQL is prepared once and executed 70,000 times. i.e. you save time for some 70,000 prepare ! Obviously, method two is much faster :)

    Cheers !

    --VC

    There arethree sides to any argument.....
    your side, my side and the right side.

Re: which method is more faster?
by f00li5h (Chaplain) on Jul 25, 2007 at 03:05 UTC

    Benchmarking it may help you decide which is faster...

    @_=qw; ask f00li5h to appear and remain for a moment of pretend better than a lifetime;;s;;@_[map hex,split'',B204316D8C2A4516DE];;y/05/os/&print;

Re: which method is more faster?
by GrandFather (Saint) on Jul 25, 2007 at 02:28 UTC

    TITS,But i thunk method 2 is more gooder.


    DWIM is Perl's answer to Gödel
Re: which method is more faster?
by ysth (Canon) on Jul 25, 2007 at 06:40 UTC
    2 hours seems ridiculously slow (not quite 10 updates per second). Can someone else using SQL Server via DBD::Sybase comment on whether that's an expected speed?

    But surely there's some bulk loader you could use instead, if it really takes that long?

      A lack of a proper index was the cause of his really slow query, which is what a rate of 10 updates per second indeed points to.

      Even when the OP is happy now, with an index, method 2 should be faster still. And the difference should be more visible, since less time is spent inside the DB.

Re: which method is more faster?
by pysome (Scribe) on Jul 25, 2007 at 07:55 UTC
    Now this porblem has been solved by adding INDEX in the tab columns.It spent about 50s totally by method 1 .How stupid it is if forget adding INDEX in some columns!
Re: which method is more faster?
by pysome (Scribe) on Jul 25, 2007 at 03:07 UTC
    unfortunatelly,The script report following Error:
    Panic: dynamic SQL (? placeholders) are not supported by the server yo +u are connecting to at /usr/local/lib/perl5/site_perl/5 .8.8/i686-linux/DBD/Sybase.pm line 133.
    I think the DB Server doesn't support Method 2.In this case,whether or not only Method 1 can be used ??
      You are testing against a Sybase server (or using a Sybase-driver), whereas the original post is stating you are using MS SQL Server 2005, so at least your comments are confusing if not entirely misleading.

      MS SQL Server 2005 being the latest incarnation of Microsoft's database server series, it seems unlikely that it does not support '?' placeholders.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Agreed, try DBD::ODBC instead.

        There are still other methods, such as ADO, but in my experience, ODBC works well. (And I can't wrap my head around ADO, but YMMV.)

        Actually it's FreeTDS's implementation of Sybase's Client Library API that doesn't support placeholders (yet!).

        Michael