in reply to Re: SQL: Update column(s) value with extra WHERE
in thread SQL: Update column(s) value with extra WHERE

I didn't run this code, but from inspection, I can see that there are 2 major performance issues.

(1) Prepare once, use many times. This statement, my $sth_upd = $dbh->prepare($stmt); should be outside the loop. In general, preparing is an "expensive" operation. The DB can wind up doing quite a bit of work while making its execution plan. Typically you want to put prepare statements outside of the loop. Prepare once before the loop and then use it many times within the loop. Of course, I presume that the automatic SQL statement generator gizmo does a fair amount of futzing around. Those statements should also go above the loop.

(2) Transactions are expensive. By default, each update is a separate transaction. You want to leave autocommit enabled and use $dbh->begin_work and $dbh->commit to bracket a loop with many updates. In rough numbers, say you can do 10 transactions per second. An update with 100 rows would take 10 seconds. However, if you put all 100 updates into one transaction, you could get the job done in 1 second. This order of magnitude, 10:1 performance increase is typical and will matter in a DB of any significant size.

(3) This isn't a performance issue, but this error handling is not the best way.

if( ! $sth_upd->execute(@bind) ){ $dbh->rollback(); die "update (and rolling back): ".$dbh->errstr; }
Using the RaiseError attribute when opening the DB connection will get automatic error checking/die for you. So you would just need $sth_upd->execute(@bind); Furthermore if this update is being run as one of many updates comprising a single transaction, the transaction will either 100% succeed or totally fail.

When the transaction starts, SQLite will create one or more files for journaling. When the transaction is committed, the DB is actually updated and these files are removed. If the program dies during the transaction, the DB is not actually updated at all and these journal files remain. The next time you connect to the SQLite DB, the driver will see this uncompleted transaction in process and will "cancel it out" and remove the journal files. You should then just correct whatever the problem was and run all of the updates again. There is no need in this case for a "rollback". The updates in the failed transaction are "like it never even happened".

These concepts are demo'ed my code further down in the thread.

Replies are listed 'Best First'.
Re^3: SQL: Update column(s) value with extra WHERE
by bliako (Abbot) on Jul 17, 2023 at 22:00 UTC
    $dbh->rollback(); die "update (and rolling back): ".$dbh->errstr;

    Thanks Marshall for your insight and explanations. Blame me as that was actually my code :) Two points on this:

    First, although I am aware and I think I understand the concept of grouping insert/updates for both performance and rollback benefits, I always have trouble to code that for particular DB (MySQL and SQLite are the two I use). The begin_work was complaining to me and so I commented it out as I saw (I think) some stack-overflow comment that this may be DB dependent.

    Second, I disagree with dieing at any point in my programs and I look down on code which does that and forces me to try/catch/eval each of its calls instead of checking the error code that each function, IMO, should return (with reasonable exceptions of course). There has been a recent discussion of this here: Re^7: STDERR in Test Results and then here EyeBall stumps BodBall (Error Handling) and my input here Re^8: STDERR in Test Results (where I water down my vitriol so-to-speak).

    So, yes, fine.

      $dbh->begin_work is a Perl DBI method that applies generically to any DB accessible via the DBI. The SQLite SQL command would be "BEGIN TRANSACTION", but don't do that. Use the generic DBI method because in MySQL this is "START TRANSACTION". Do not mess around with autocommit. Leave it alone and at the default of "1" or "on". When you issue a begin_work, that will automatically turn autocommit off until the commit is seen.

      I think there is some misunderstanding about $dbh->rollback. In general, you do not need to do that. When a transaction starts, SQLite will create a journal file on disk to keep track of what it is doing with the transaction. Nothing at all is changed in the DB until the commit. At the commit, all pending operations are done at once (this is very quick). At that point the changes become permanent to the DB.

      So what happens if your program dies during a transaction? This will leave remnants of the aborted transaction in various SQLite log files. The next time you connect to SQLite, it will see this garbage associated with a connection that no longer exits and it will clean up all that stuff automagically. There is no need for you to "rollback".

      In most circumstances, there is no obvious corrective action that can be taken in light of a failed SQL operation. Reporting the SQL error, line number, and stopping is about all you can do. Now if you do have a sophisticated program that can keep going after a failure, then a rollback will clear out the pending transaction's operations so that you can start anew without restarting your program/re-connecting to the DB.

      So if you are in the simple model of "I am going to stop if my DB operation fails", enable RaiseError and don't worry about checking for error flags or any sort of try/catch stuff - just do the DB operations and let error reporting and death happen if errors.

      If you are an experienced DB developer and have a sophisticated app, then there are multiple error recovery strategies that can be used. My advice is to stay away from complicated stuff if you don't need complicated stuff.

      Second, I disagree with dieing at any point in my programs and I look down on code which does that and forces me to try/catch/eval each of its calls instead of checking the error code that each function, IMO, should return

      I'm familiar with this argument, and I wouldn't bother extending it except that in your own code above you wrote:

      # dies on error sub db_modify_column_value

      So to some degree, you must see the benefit of not needing to check the return value of db_modify_column_value each time you call it.

      For programs in C/C++ where exception handling can get messy and affects the performance (which is the whole reason for using C/C++) there's a decent reason to avoid exceptions and use return codes. There's also a perfect reason for using return codes on actions where failure is common or expected, and where a user will likely have a fall-back plan. But in code for everyday scripting where you expect it to work and the fall-back plan is basically "report the error as well as possible and give up", there's no reason not to use them. Using the ones that DBI provides for you saves you a ton of fingerwork and saves the person reading your code from eye strain.

      BTW, you didn't check any return values of 'prepare' or whether iterating 'fetchrow_hashref' ended naturally or with an error (like the connection resetting). If you turn exceptions off, you need to add error checking *everywhere* or else you can end up with some really hard to diagnose bugs later on.

        > For programs in C/C++ where exception handling can get messy and affects the performance (which is the whole reason for using C/C++) there's a decent reason to avoid exceptions and use return codes

        Nowadays, performance is almost never a reason to avoid exceptions in C++. The only exception (pun intended) I'm aware of is hard real-time applications where people might die if a computation takes too long (in such applications, even malloc is banned).

        > There's also a perfect reason for using return codes on actions where failure is common or expected, and where a user will likely have a fall-back plan

        Yes. Though more or less a matter of taste, the code can sometimes look simpler and clearer to me when checking for common and expected failure return codes, rather than throwing and catching exceptions.

        See also:

        Real-time Computing References

        • RAID (wikipedia) - redundant array of inexpensive/independent disks
        • Cloud storage (wikipedia) - a model of computer data storage in which data, said to be on "the cloud", is stored remotely in logical pools and is accessible to users over a network, typically the Internet

        • Arduino (wikipedia) - designs and manufactures single-board microcontrollers and microcontroller kits for building digital devices
        • Dongle (wikipedia) - a small piece of computer hardware that connects to a port on another device to provide it with additional functionality
        • GPSd - gpsd is a service daemon that monitors one or more GPSes or AIS receivers attached to a host computer through serial or USB ports, making all data on the location/course/velocity of the sensors available to be queried on TCP port 2947 of the host computer

        Virtual Machines References

        • Re^2: Dancer newbie question by NERDVANA (2024) - advises managing your own virtual machine on a cloud platform like Linode or Digital Ocean (notes that DigitalOcean also has the App service which deploys one of your GitHub repos directly into a VM for you)
        • Re: An update (was: Re^2: Holding site variables) by afoken (2024) - advises Bod to consider using virtual machines for the development server; compares VMware, VirtualBox, Proxmox (Debian Linux, provides VMs and containers, open source); RAID, SATA SSDs v SMRs, UPS discussed too

        except that in your own code above you wrote:

        In light of the recent die discussions mentioned I did not want to inflame things and divert the focus to die. Indeed the focus has shifted and I failed! On the other hand, code like that is far more convenient which is an added point granted in favour of die.

        BTW, you didn't check any return values of 'prepare' or whether iterating 'fetchrow_hashref' ended naturally or with an error (like the connection resetting).

        Good point. Though the above code was just for asking the question which focused elsewhere:

        find lib -type f -exec grep -H -w 'prepare(' \{\} \; if( ! defined $sth ){ warn $sqlstr."\n".__FILE__."::db_get_enums_for_t +able() (line ".__LINE__.") : error, call to ".'prepare()'." has faile +d for above SQL: ".$dbh->errstr; return undef } ...
Re^3: SQL: Update column(s) value with extra WHERE
by Anonymous Monk on Jul 16, 2023 at 19:45 UTC

    Granted. I was focused more on eliminating the tricky code. Specifically:

    1. Absolutely correct. Prepare is expensive. The thing is, both the workings of SQL::Abstract and the OP's logic seem to encourage a prepare for each execute. If I were implementing I would ditch SQL::Abstract in favor of straight DBI. But for better or worse I decided to restrict my response to the original question.
    2. Equally correct. I was careless with transaction scoping. Thanks.
    3. Your preferred error handling is also mine: turn off PrintError, turn off RaiseError, and write simpler code. Again, for better or worse I decided to focus on the original question.

    Thanks.

    wyant

      Hi Wyant,

      I think "turn off RaiseError" is a typo. The default for both PrintError and RaiseError is off (0). RaiseError=1 essentially means "report errors by dying". This is my standard open DB code:

      my %attr = ( RaiseError => 1); #auto die with error printout my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",\%attr) or die "Couldn't connect to database $dbfile: " . DBI->errstr;
      Yes, the main thing about the OP's post is that updating is too complicated. My code below:
      $dbh->begin_work; foreach my $row_hash_ref (@$result){ $exe->($row_hash_ref,$col_arrayref); $update->execute(@$row_hash_ref{@$col_arrayref},$row_hash_ref->{rowi +d}); } $dbh->commit;
      For each row that was selected with the WHERE clause, run the user-supplied function, then use the pre-prepared SQL update statement. If an error happens during an update, this will happen before the commit and the result will be that no updates at all are done. "Like it never even happened". This code will run very quickly.

      Update: The update being done in the question has the property that running it multiple times is harmless. That is a very nice property to have. Maybe you have an update without that property and you want to save a copy of the DB. I would use the DB to do that via standard SQL instead of Perl code. You could have a regular table or a temp in-memory table for that purpose. In general, once you have a DB, push as much work as possible onto it (like sorting, copying, etc.)