Re^2: SQL: Update column(s) value with extra WHERE
by Marshall (Canon) on Jul 15, 2023 at 19:59 UTC
|
I didn't run this code, but from inspection, I can see that there are 2 major performance issues.
(1) Prepare once, use many times. This statement, my $sth_upd = $dbh->prepare($stmt); should be outside the loop. In general, preparing is an "expensive" operation. The DB can wind up doing quite a bit of work while making its execution plan. Typically you want to put prepare statements outside of the loop. Prepare once before the loop and then use it many times within the loop. Of course, I presume that the automatic SQL statement generator gizmo does a fair amount of futzing around. Those statements should also go above the loop.
(2) Transactions are expensive. By default, each update is a separate transaction. You want to leave autocommit enabled and use $dbh->begin_work and $dbh->commit to bracket a loop with many updates. In rough numbers, say you can do 10 transactions per second. An update with 100 rows would take 10 seconds. However, if you put all 100 updates into one transaction, you could get the job done in 1 second. This order of magnitude, 10:1 performance increase is typical and will matter in a DB of any significant size.
(3) This isn't a performance issue, but this error handling is not the best way.
if( ! $sth_upd->execute(@bind) ){
$dbh->rollback();
die "update (and rolling back): ".$dbh->errstr;
}
Using the RaiseError attribute when opening the DB connection will get automatic error checking/die for you. So you would just need $sth_upd->execute(@bind); Furthermore if this update is being run as one of many updates comprising a single transaction, the transaction will either 100% succeed or totally fail.
When the transaction starts, SQLite will create one or more files for journaling. When the transaction is committed, the DB is actually updated and these files are removed. If the program dies during the transaction, the DB is not actually updated at all and these journal files remain. The next time you connect to the SQLite DB, the driver will see this uncompleted transaction in process and will "cancel it out" and remove the journal files. You should then just correct whatever the problem was and run all of the updates again. There is no need in this case for a "rollback". The updates in the failed transaction are "like it never even happened".
These concepts are demo'ed my code further down in the thread. | [reply] [d/l] [select] |
|
|
$dbh->rollback();
die "update (and rolling back): ".$dbh->errstr;
Thanks Marshall for your insight and explanations. Blame me as that was actually my code :) Two points on this:
First, although I am aware and I think I understand the concept of grouping insert/updates for both performance and rollback benefits, I always have trouble to code that for particular DB (MySQL and SQLite are the two I use). The begin_work was complaining to me and so I commented it out as I saw (I think) some stack-overflow comment that this may be DB dependent.
Second, I disagree with dieing at any point in my programs and I look down on code which does that and forces me to try/catch/eval each of its calls instead of checking the error code that each function, IMO, should return (with reasonable exceptions of course). There has been a recent discussion of this here: Re^7: STDERR in Test Results and then here EyeBall stumps BodBall (Error Handling) and my input here Re^8: STDERR in Test Results (where I water down my vitriol so-to-speak).
So, yes, fine. | [reply] [d/l] [select] |
|
|
$dbh->begin_work is a Perl DBI method that applies generically to any DB accessible via the DBI. The SQLite SQL command would be "BEGIN TRANSACTION", but don't do that. Use the generic DBI method because in MySQL this is "START TRANSACTION". Do not mess around with autocommit. Leave it alone and at the default of "1" or "on". When you issue a begin_work, that will automatically turn autocommit off until the commit is seen.
I think there is some misunderstanding about $dbh->rollback. In general, you do not need to do that. When a transaction starts, SQLite will create a journal file on disk to keep track of what it is doing with the transaction. Nothing at all is changed in the DB until the commit. At the commit, all pending operations are done at once (this is very quick). At that point the changes become permanent to the DB.
So what happens if your program dies during a transaction? This will leave remnants of the aborted transaction in various SQLite log files. The next time you connect to SQLite, it will see this garbage associated with a connection that no longer exits and it will clean up all that stuff automagically. There is no need for you to "rollback".
In most circumstances, there is no obvious corrective action that can be taken in light of a failed SQL operation. Reporting the SQL error, line number, and stopping is about all you can do. Now if you do have a sophisticated program that can keep going after a failure, then a rollback will clear out the pending transaction's operations so that you can start anew without restarting your program/re-connecting to the DB.
So if you are in the simple model of "I am going to stop if my DB operation fails", enable RaiseError and don't worry about checking for error flags or any sort of try/catch stuff - just do the DB operations and let error reporting and death happen if errors.
If you are an experienced DB developer and have a sophisticated app, then there are multiple error recovery strategies that can be used. My advice is to stay away from complicated stuff if you don't need complicated stuff.
| [reply] [d/l] |
|
|
# dies on error
sub db_modify_column_value
So to some degree, you must see the benefit of not needing to check the return value of db_modify_column_value each time you call it.
For programs in C/C++ where exception handling can get messy and affects the performance (which is the whole reason for using C/C++) there's a decent reason to avoid exceptions and use return codes. There's also a perfect reason for using return codes on actions where failure is common or expected, and where a user will likely have a fall-back plan. But in code for everyday scripting where you expect it to work and the fall-back plan is basically "report the error as well as possible and give up", there's no reason not to use them. Using the ones that DBI provides for you saves you a ton of fingerwork and saves the person reading your code from eye strain.
BTW, you didn't check any return values of 'prepare' or whether iterating 'fetchrow_hashref' ended naturally or with an error (like the connection resetting). If you turn exceptions off, you need to add error checking *everywhere* or else you can end up with some really hard to diagnose bugs later on.
| [reply] [d/l] |
|
|
|
|
|
|
| [reply] |
|
|
my %attr = ( RaiseError => 1); #auto die with error printout
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",\%attr)
or die "Couldn't connect to database $dbfile: " . DBI->errstr;
Yes, the main thing about the OP's post is that updating is too complicated. My code below:
$dbh->begin_work;
foreach my $row_hash_ref (@$result){
$exe->($row_hash_ref,$col_arrayref);
$update->execute(@$row_hash_ref{@$col_arrayref},$row_hash_ref->{rowi
+d});
}
$dbh->commit;
For each row that was selected with the WHERE clause, run the user-supplied function, then use the pre-prepared SQL update statement. If an error happens during an update, this will happen before the commit and the result will be that no updates at all are done. "Like it never even happened". This code will run very quickly.
Update: The update being done in the question has the property that running it multiple times is harmless. That is a very nice property to have. Maybe you have an update without that property and you want to save a copy of the DB. I would use the DB to do that via standard SQL instead of Perl code. You could have a regular table or a temp in-memory table for that purpose. In general, once you have a DB, push as much work as possible onto it (like sorting, copying, etc.) | [reply] [d/l] [select] |