SQL: Update column(s) value with extra WHERE

bliako has asked for the wisdom of the Perl Monks concerning the following question:

Wise Monks,

My problem today is to update values of table columns. For example, convert contents of columns 'x' and 'y' to uppercase. I made a sub db_modify_column_value($dbh, $tablename, $colnames, $subexec, $WHERE) which will do this. I would like to ask you to confirm my logic which is as follows:

Select all rows in the table with optional $WHERE.
Run $subexec->($arow, $colnames) for each selected row, this sub will clone the input row and modify the specified columns in @$colnames. In the above example, this will be a simple uppercase.
Save the subset of each row (sliced on @$colnames) before ($arowbefore) and after ($arowafter) modifications.
For each pair of the subsets, update the DB using: UPDATE table SET $arowafter WHERE $WHERE AND $arowbefore

The tricky parts I am seeking confirmation are:

Specifying the where as $arowbefore has the side-effect of updating multiple rows which happen to satisfy that their @$colnames had that value. I see no problem there myself.
How to merge the $WHERE (which is an optional user-specified where to operate on certain rows only) and the $arowbefore. What I am doing now is to append AND $arowbefore if there is a $WHERE or WHERE $aroebefore if there is no $WHERE.

bw, bliako

Here is my attempt with a test case using SQLite for ease of reproduction. It passes a minimal test suite.

use SQL::Abstract;
use Data::Dumper;

use DBD::SQLite;
use Test::More;

use strict;
use warnings;

my $tablename = 'atable';
my $dbfile = 'tmp.sqlite';

my $dbh = create_mock_table($tablename, $dbfile);

my $WHERE = undef; #{ 'c3' => ['c','ccc'] };

db_modify_column_value(
  $dbh,
  $tablename,
  [qw/c1 c2/], # the columns to modify with below sub
  sub {
    my ($arow, $colnames) = @_;
    return { map { $_ => uc $arow->{$_} } @$colnames} },
  $WHERE
);

my $results = fetchrows_mock_table($dbh, $tablename);

my @expected_results = (
    {'id' => 1, 'c1' => 'A', 'c2' => 'B', 'c3' => 'c'},   
    {'id' => 2, 'c1' => 'AA', 'c2' => 'BB', 'c3' => 'cc'},
    {'id' => 3, 'c1' => 'AAA', 'c2' => 'BBB', 'c3' => 'ccc'},
    # replicate a row which will processed
    {'id' => 4, 'c1' => 'A', 'c2' => 'B', 'c3' => 'c'},
);
is_deeply($results, \@expected_results, "result is as expected for no 
+WHERE");

# now test with a WHERE
$dbh = create_mock_table($tablename, $dbfile);

$WHERE = { 'c3' => ['c','ccc'] };

db_modify_column_value(
  $dbh,
  $tablename,
  [qw/c1 c2/], # the columns to modify with below sub
  sub {
    my ($arow, $colnames) = @_;
    return { map { $_ => uc $arow->{$_} } @$colnames} },
  $WHERE
);

$results = fetchrows_mock_table($dbh, $tablename);

# now test with a WHERE
@expected_results = (
    {'id' => 1, 'c1' => 'A', 'c2' => 'B', 'c3' => 'c'},   
    # the WHERE will not select this row, so it is not modified
    {'id' => 2, 'c1' => 'aa', 'c2' => 'bb', 'c3' => 'cc'},
    {'id' => 3, 'c1' => 'AAA', 'c2' => 'BBB', 'c3' => 'ccc'},
    # replicate a row which will processed
    {'id' => 4, 'c1' => 'A', 'c2' => 'B', 'c3' => 'c'},
);

is_deeply($results, \@expected_results, "result is as expected with a 
+WHERE");


$dbh->disconnect;

done_testing;


# It will modify all values of columns @$colnames
# of table $tablename
# by calling &$subexec on each table row
# selected by the optional $WHERE
# dies on error
sub   db_modify_column_value {
  my ($dbh, $tablename, $colnames, $subexec, $WHERE) = @_;

  print "called for table '$tablename' and cols: '@$colnames' ...\n";
  my $abstSEL = SQL::Abstract->new();
  my ($stmt, @bind) = $abstSEL->select($tablename, '', $WHERE);
  my $sth = $dbh->prepare($stmt);
  if( ! $sth->execute(@bind) ){ die "select: ".$dbh->errstr }

  # do the modifications and store results in a temp array
  my @updates;
  while( my $row = $sth->fetchrow_hashref() ){
       print "$0 : Processing row: ".Dumper($row)."\n";
       my $orirow_only_colnames = { %{ $row }{@$colnames} };
       # $subexec is a subref which takes the row as a hashref
       # and returns a modified clone of it to be inserted into the db
       # so, newrow and row have the same keys exactly
       my $newrow = $subexec->($row, $colnames);
       print "$0 : result is: ".Dumper($newrow)."\n";
       my $newrow_only_colnames = { %{ $newrow }{@$colnames} };
       push @updates, [
         $orirow_only_colnames,
         $newrow_only_colnames
       ];
  }

  # now update
  # this will update multiple rows possibly
  # we should be able to rollback in case of errors
  $dbh->{AutoCommit} = 0;
#  $dbh->begin_work;
  my $abstUP = SQL::Abstract->new();
  for my $ups (@updates){
       my ($orirow_only_colnames, $newrow_only_colnames) = @$ups;
       my ($extraWHERE, @extrabind) = $abstUP->where($orirow_only_coln
+ames);
       if( defined $WHERE ){
         $extraWHERE =~ s/^\s*WHERE\s*//g;
       }
       my($stmt, @bind) = $abstUP->update($tablename, $newrow_only_col
+names, $WHERE);
       if( defined $WHERE ){ $stmt .= ' AND ' }
       $stmt .= $extraWHERE;
       my @allbind = (@bind, @extrabind);
       print "executing SQL:\n$stmt\nwith binds:\n@allbind\n";
       $sth = $dbh->prepare($stmt);
       if( ! $sth->execute(@allbind) ){
         $dbh->rollback();
         die "update (and rolling back): ".$dbh->errstr;
       }
  }
   $dbh->commit();
   print "db_modify_column_value() : committed and returning.\n";
   return 1; # success
}

sub create_mock_table {
  my ($tablename, $dbfile) = @_;
  unlink $dbfile;
  my $dbh = DBI->connect("dbi:SQLite:dbname=${dbfile}","","")
    || die "Cannot create handle: $DBI::errstr\n";

  my @tabledata = (
    {'id' => 1, 'c1' => 'a', 'c2' => 'b', 'c3' => 'c'},   
    {'id' => 2, 'c1' => 'aa', 'c2' => 'bb', 'c3' => 'cc'},
    {'id' => 3, 'c1' => 'aaa', 'c2' => 'bbb', 'c3' => 'ccc'},
    # replicate a row
    {'id' => 4, 'c1' => 'A', 'c2' => 'B', 'c3' => 'c'},
  );

  my ($stmt, @bind, $sth);
  $stmt = "CREATE TABLE IF NOT EXISTS ${tablename} (id INTEGER PRIMARY
+ KEY, c1 varchar(10), c2 varchar(10), c3 varchar(10));";
  $sth = $dbh->prepare($stmt);
  if( ! $sth->execute(@bind) ){ die "mock-create: ".$dbh->errstr }

  # do insert
  my $abstTEST = SQL::Abstract->new;

  for my $arow (@tabledata){
    ($stmt, @bind) = $abstTEST->insert($tablename, $arow);
    print "create_mock_table() : executing $stmt (@bind).\n";
    my $sth = $dbh->prepare($stmt);
    if( ! $sth->execute(@bind) ){ die "mock-insert: ".$dbh->errstr }
  }

  return $dbh
}

sub fetchrows_mock_table {
  my ($dbh, $tablename) = @_;
  my $abstTEST = SQL::Abstract->new;
  my ($stmt, @bind) = $abstTEST->select($tablename, '');
  print "fetchrows_mock_table() : executing $stmt.\n";
  my $sth = $dbh->prepare($stmt);
  if( ! $sth->execute(@bind) ){ die "fetchrows_mock_table() : select: 
+".$dbh->errstr }
  my @results;
  while( my $row = $sth->fetchrow_hashref() ){
    push @results, $row;
  }
  # sort results wrt id because in order to compare
  @results = sort { $a->{'id'} <=> $b->{'id'} } @results;
  return \@results
}
[download]

Comment on SQL: Update column(s) value with extra WHERE Select or Download Code

Replies are listed 'Best First'.
Re: SQL: Update column(s) value with extra WHERE by Marshall (Canon) on Jul 15, 2023 at 07:16 UTC
I did get your code to run but that required 2 modifications: `$dbh->{AutoCommit} = 0; #DO NOT USE THIS LINE # $dbh->begin_work; #USE THIS LINE` [download] That caused the inability to acquire a write lock (transaction not finished properly). This next violated UNIQUE constraint (trying to add an exact copy of an existing row) `$dbh->do("DROP TABLE IF EXISTS $tablename"); #ADDED THIS $stmt = "CREATE TABLE IF NOT EXISTS $tablename.....` [download] I wound up re-writing the flow of your program. A few comments first. I like to use the "RaiseError" attribute when opening a DB connection. If something goes wrong, you will get an error report and the program will die automatically. So you can simplify `if( ! $sth->execute(@bind) ){ die "mock-create: ".$dbh->errstr }` to just `$sth->execute(@bind)` The code winds up being a bit cleaner and I can't forget to check for errors. I recommend against changing the default name of the default PRIMARY KEY. I suppose there could be some DB interoperability reason to do that, but in general, I would not do that. When doing a CREATE TABLE there is an implicit line automatically added for you, `rowid integer primary key autoincrement,` In my code below, I just rolled with the default name and took out the explicit assignments to rowid (or what you called id). I didn't worry about Test::More code - that doesn't appear to be an issue. I usually put the DB open near the top of the program, not buried down in some subroutine. I wanna see the options like RaiseError in a prominent place. Also if I do want to for some reason close the connection, I like having the open() and close() at the same program level. For this code, I used an array of refs to hash where each row in the DB is one hash table. I usually work with arrays instead of hashes, but the hash option looks fine here. I have found that if you are returning just some thousands of lines, it is better to just return the whole result set at once rather than beating on the I/F for each row. With a million-line result set, I would consider something different, like going row by row. So, I started with (1) Getting the subset of rows in "atable" that are of interest (by using the WHERE clause, or not). Then (2) Iterate over each row and apply the user-supplied function on each row and run an update with just the values which could potentially be changed. I saw no need to make copies of anything. The rowid (primary key) is guaranteed to be unique and that is how you id the exact row to be updated. Note that I ran all of the updates as a single transaction. This is fine. A transaction can have a million rows in it - no problem. There is a small limit on the number of transactions per second. The size of the transaction (how many rows are affected) has little to do with how long it takes. There is just inherently a lot of overhead in a transaction. The way that my code is done, 100% of the updates will succeed or None of them will. There is no user-written "roll-back" code needed. The DB will take care itself. SQLite is an ACID Compliant Database. So here is my effort...have fun! use SQL::Abstract; use Data::Dumper; use DBD::SQLite; use Test::More; use Data::Dump qw(pp); use strict; use warnings; my $tablename = 'atable'; my $dbfile = 'tmp.sqlite'; if (-e $dbfile) { unlink $dbfile or die "delete of existing $dbfile failed $!"; } my %attr = ( RaiseError => 1); #auto die with error printout my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",\%attr) or die "Couldn't connect to database $dbfile: " . DBI->errstr; create_test_table( $dbh, $tablename); print "Original Test Table Contents...\n"; dump_test_table ( $dbh, $tablename); ### TEST 1 ### my $WHERE = undef; # (c3 = 'c') OR (c3 = 'ccc') db_modify_column_value( $dbh, $tablename, [qw/c1 c2/], # the columns to modify with below sub sub { my ($href, $colnames) = @_; foreach (@$colnames) { $href->{$_}= uc ($href->{$_}); } }, $WHERE ); print "First Test Without a Where Clause...\n"; print "Test Table after modifications to cols c1 and c2....\n"; dump_test_table ( $dbh, $tablename); ### TEST 2 ### create_test_table( $dbh, $tablename); print "Second Test mods to c1 and c2 WHERE (c3 = 'c') OR (c3 = 'ccc')\ +n"; $WHERE = "WHERE (c3 = 'c') OR (c3 = 'ccc')"; db_modify_column_value( $dbh, $tablename, [qw/c1 c2/], # the columns to modify with below sub sub { my ($href, $colnames) = @_; foreach (@$colnames) { $href->{$_}= uc ($href->{$_}); } }, $WHERE ); print "Test Table after modifications....\n"; dump_test_table ( $dbh, $tablename); #### subs ####### ## sub db_modify_column_value{ my ($dbh,$tablename,$col_arrayref,$exe,$WHERE)=@_; my $result= fetchrows_subset_of_table($dbh,$tablename,$WHERE); my $updateSQL = "UPDATE $tablename SET ".join("=?,",@$col_arrayref) +."=?"." WHERE rowid =?"; # print "$updateSQL\n"; my $update = $dbh->prepare($updateSQL); $dbh->begin_work; foreach my $row_hash_ref (@$result){ $exe->($row_hash_ref,$col_arrayref); $update->execute(@$row_hash_ref{@$col_arrayref},$row_hash_ref->{ +rowid}); } $dbh->commit; } sub create_test_table { my ($dbh, $tablename ) = @_; my @tabledata = ( {'c1' => 'a', 'c2' => 'b', 'c3' => 'c'}, {'c1' => 'aa', 'c2' => 'bb', 'c3' => 'cc'}, {'c1' => 'aaa', 'c2' => 'bbb', 'c3' => 'ccc'}, # replicate a row {'c1' => 'A', 'c2' => 'B', 'c3' => 'c'}, ); $dbh->do("DROP TABLE IF EXISTS $tablename"); $dbh->do("CREATE TABLE IF NOT EXISTS $tablename (c1 TEXT, c2 TEXT, c +3 TEXT);"); my $insert = $dbh->prepare("INSERT INTO $tablename (c1,c2,c3) VALUES + (?,?,?)"); $dbh->begin_work; for my $rowref (@tabledata){ $insert->execute(@$rowref{qw(c1 c2 c3)}); } $dbh->commit; } sub dump_test_table{ my ($dbh, $tablename)=@_; my $result = fetchrows_subset_of_table ($dbh,$tablename); print "Dumping Current Values in $tablename\n"; pp $result; } ## fetchrows_subset_of_table() # returns a ref to array of hashref of subset of a table # specified by an SQL WHERE clause sub fetchrows_subset_of_table { my ($dbh, $tablename, $WHERE) = @_; $WHERE //= ''; my $results = $dbh->selectall_arrayref( "SELECT rowid, * FROM $tablename $WHERE ORDER BY rowid", { Slice => {} } ); return $results; # $results is a ref to an array of hash ref's } __END__ Original Test Table Contents... Dumping Current Values in atable [ { c1 => "a", c2 => "b", c3 => "c", rowid => 1 }, { c1 => "aa", c2 => "bb", c3 => "cc", rowid => 2 }, { c1 => "aaa", c2 => "bbb", c3 => "ccc", rowid => 3 }, { c1 => "A", c2 => "B", c3 => "c", rowid => 4 }, ] First Test Without a Where Clause... Test Table after modifications to cols c1 and c2.... Dumping Current Values in atable [ { c1 => "A", c2 => "B", c3 => "c", rowid => 1 }, { c1 => "AA", c2 => "BB", c3 => "cc", rowid => 2 }, { c1 => "AAA", c2 => "BBB", c3 => "ccc", rowid => 3 }, { c1 => "A", c2 => "B", c3 => "c", rowid => 4 }, ] Second Test mods to c1 and c2 WHERE (c3 = 'c') OR (c3 = 'ccc') Test Table after modifications.... Dumping Current Values in atable [ { c1 => "A", c2 => "B", c3 => "c", rowid => 1 }, { c1 => "aa", c2 => "bb", c3 => "cc", rowid => 2 }, { c1 => "AAA", c2 => "BBB", c3 => "ccc", rowid => 3 }, { c1 => "A", c2 => "B", c3 => "c", rowid => 4 }, ] [download] Update: Of course, you aren't really doing just upper case on a row, else this standard SQL would work: `UPDATE $tablename SET c1=upper(c1)` which would update column c1 in all rows. I mention it because there is another option, which is to define a user function that works like "upper()". That is possible even with SQLite. However, this is a lot of hassle and I haven't actually done that in production. The performance is typically just not worth it. But just a mention.	[reply] [d/l] [select]
Re^2: SQL: Update column(s) value with extra WHERE by karlgoethebier (Abbot) on Jul 17, 2023 at 08:20 UTC
��define a user function�� Butter with the fish: Create Or Redefine SQL Functions Writing a Custom SQLite Function (in C) - Part 1 Writing a Custom SQLite Function (in C) - Part 2 $dbh->sqlite_create_function(-$name,-$argc,-$code_ref,-$flags-) of DBD::SQLite �The Crux of the Biscuit is the Apostrophe�	[reply]
Re^3: SQL: Update column(s) value with extra WHERE by Marshall (Canon) on Jul 17, 2023 at 09:40 UTC
Thanks! That is a good compilation of links. I use one application written in Ruby that uses MySQL. I have considered porting it to Perl and SQLite. I show the .sql file for the Jaro-Winkler algorithm below. Jaro-Winkler is similar to Levenshtein distance but it is optimized for short strings. Like maybe comparing license plates, radio callsigns, or something similar. There is a pure Perl algorithm implementation. But a C implementation built into the DB's work would run much faster. If I ever get a C program going, I could also make an XS Perl function. Jaro-Winkler.sql Read more... (4 kB)	[reply] [d/l]
Re^4: SQL: Update column(s) value with extra WHERE by karlgoethebier (Abbot) on Jul 17, 2023 at 10:44 UTC
Re: SQL: Update column(s) value with extra WHERE by Anonymous Monk on Jul 14, 2023 at 17:53 UTC
I think I would deal with the tricky parts by doing something less tricky. If your example reflects reality your database table has a primary key. Why not just use the primary key in your `update()`?. That would make your while loop something like this: my $abstUP = SQL::Abstract->new(); while( my $row = $sth->fetchrow_hashref() ){ print "$0 : Processing row: ".Dumper($row)."\n"; # $subexec is a subref which takes the row as a hashref # and returns a modified clone of it to be inserted into the db # so, newrow and row have the same keys exactly my $newrow = $subexec->($row, $colnames); print "$0 : result is: ".Dumper($newrow)."\n"; my $newrow_only_colnames = { %{ $newrow }{@$colnames} }; # Update the row my($stmt, @bind) = $abstUP->update($tablename, $newrow_only_col +names, { id => $row->{id} } ); print "executing SQL:\n$stmt\nwith binds:\n@bind\n"; my $sth_upd = $dbh->prepare($stmt); if( ! $sth_upd->execute(@bind) ){ $dbh->rollback(); die "update (and rolling back): ".$dbh->errstr; } } [download] Note that the above uses `$sth_upd` for the handle to the update statement. This was done because my refactor does the update while the select is still active, so re-using `$sth` would clobber the select. wyant	[reply] [d/l] [select]
Re^2: SQL: Update column(s) value with extra WHERE by Marshall (Canon) on Jul 15, 2023 at 19:59 UTC
I didn't run this code, but from inspection, I can see that there are 2 major performance issues. (1) Prepare once, use many times. This statement, `my $sth_upd = $dbh->prepare($stmt);` should be outside the loop. In general, preparing is an "expensive" operation. The DB can wind up doing quite a bit of work while making its execution plan. Typically you want to put prepare statements outside of the loop. Prepare once before the loop and then use it many times within the loop. Of course, I presume that the automatic SQL statement generator gizmo does a fair amount of futzing around. Those statements should also go above the loop. (2) Transactions are expensive. By default, each update is a separate transaction. You want to leave `autocommit` enabled and use `$dbh->begin_work and $dbh->commit` to bracket a loop with many updates. In rough numbers, say you can do 10 transactions per second. An update with 100 rows would take 10 seconds. However, if you put all 100 updates into one transaction, you could get the job done in 1 second. This order of magnitude, 10:1 performance increase is typical and will matter in a DB of any significant size. (3) This isn't a performance issue, but this error handling is not the best way. `if( ! $sth_upd->execute(@bind) ){ $dbh->rollback(); die "update (and rolling back): ".$dbh->errstr; }` [download] Using the RaiseError attribute when opening the DB connection will get automatic error checking/die for you. So you would just need `$sth_upd->execute(@bind);` Furthermore if this update is being run as one of many updates comprising a single transaction, the transaction will either 100% succeed or totally fail. When the transaction starts, SQLite will create one or more files for journaling. When the transaction is committed, the DB is actually updated and these files are removed. If the program dies during the transaction, the DB is not actually updated at all and these journal files remain. The next time you connect to the SQLite DB, the driver will see this uncompleted transaction in process and will "cancel it out" and remove the journal files. You should then just correct whatever the problem was and run all of the updates again. There is no need in this case for a "rollback". The updates in the failed transaction are "like it never even happened". These concepts are demo'ed my code further down in the thread.	[reply] [d/l] [select]
Re^3: SQL: Update column(s) value with extra WHERE by bliako (Abbot) on Jul 17, 2023 at 22:00 UTC
`$dbh->rollback(); die "update (and rolling back): ".$dbh->errstr;` [download] Thanks Marshall for your insight and explanations. Blame me as that was actually my code :) Two points on this: First, although I am aware and I think I understand the concept of grouping insert/updates for both performance and rollback benefits, I always have trouble to code that for particular DB (MySQL and SQLite are the two I use). The `begin_work` was complaining to me and so I commented it out as I saw (I think) some stack-overflow comment that this may be DB dependent. Second, I disagree with `die`ing at any point in my programs and I look down on code which does that and forces me to `try/catch/eval` each of its calls instead of checking the error code that each function, IMO, should return (with reasonable exceptions of course). There has been a recent discussion of this here: Re^7: STDERR in Test Results and then here EyeBall stumps BodBall (Error Handling) and my input here Re^8: STDERR in Test Results (where I water down my vitriol so-to-speak). So, yes, fine.	[reply] [d/l] [select]
Re^4: SQL: Update column(s) value with extra WHERE by Marshall (Canon) on Jul 18, 2023 at 06:32 UTC
Re^4: SQL: Update column(s) value with extra WHERE by NERDVANA (Priest) on Jul 19, 2023 at 02:32 UTC
Re^5: SQL: Update column(s) value with extra WHERE (Real-time Computing and Virtual Machine References) by eyepopslikeamosquito (Archbishop) on Jul 19, 2023 at 09:12 UTC
Re^5: SQL: Update column(s) value with extra WHERE by bliako (Abbot) on Jul 19, 2023 at 07:26 UTC
Re^3: SQL: Update column(s) value with extra WHERE by Anonymous Monk on Jul 16, 2023 at 19:45 UTC
Granted. I was focused more on eliminating the tricky code. Specifically: Absolutely correct. Prepare is expensive. The thing is, both the workings of SQL::Abstract and the OP's logic seem to encourage a prepare for each execute. If I were implementing I would ditch SQL::Abstract in favor of straight DBI. But for better or worse I decided to restrict my response to the original question. Equally correct. I was careless with transaction scoping. Thanks. Your preferred error handling is also mine: turn off PrintError, turn off RaiseError, and write simpler code. Again, for better or worse I decided to focus on the original question. Thanks. wyant	[reply]
Re^4: SQL: Update column(s) value with extra WHERE by Marshall (Canon) on Jul 17, 2023 at 09:11 UTC