Re: SQL: Update column(s) value with extra WHERE

I did get your code to run but that required 2 modifications:

  $dbh->{AutoCommit} = 0; #DO NOT USE THIS LINE
#  $dbh->begin_work;  #USE THIS LINE
[download]

That caused the inability to acquire a write lock (transaction not finished properly).
This next violated UNIQUE constraint (trying to add an exact copy of an existing row)

$dbh->do("DROP TABLE IF EXISTS $tablename"); #ADDED THIS
$stmt = "CREATE TABLE IF NOT EXISTS $tablename.....
[download]

I wound up re-writing the flow of your program. A few comments first. I like to use the "RaiseError" attribute when opening a DB connection. If something goes wrong, you will get an error report and the program will die automatically. So you can simplify if( ! $sth->execute(@bind) ){ die "mock-create: ".$dbh->errstr } to just $sth->execute(@bind) The code winds up being a bit cleaner and I can't forget to check for errors.

I recommend against changing the default name of the default PRIMARY KEY. I suppose there could be some DB interoperability reason to do that, but in general, I would not do that. When doing a CREATE TABLE there is an implicit line automatically added for you, rowid integer primary key autoincrement, In my code below, I just rolled with the default name and took out the explicit assignments to rowid (or what you called id). I didn't worry about Test::More code - that doesn't appear to be an issue.

I usually put the DB open near the top of the program, not buried down in some subroutine. I wanna see the options like RaiseError in a prominent place. Also if I do want to for some reason close the connection, I like having the open() and close() at the same program level.

For this code, I used an array of refs to hash where each row in the DB is one hash table. I usually work with arrays instead of hashes, but the hash option looks fine here. I have found that if you are returning just some thousands of lines, it is better to just return the whole result set at once rather than beating on the I/F for each row. With a million-line result set, I would consider something different, like going row by row.

So, I started with (1) Getting the subset of rows in "atable" that are of interest (by using the WHERE clause, or not). Then (2) Iterate over each row and apply the user-supplied function on each row and run an update with just the values which could potentially be changed. I saw no need to make copies of anything. The rowid (primary key) is guaranteed to be unique and that is how you id the exact row to be updated.

Note that I ran all of the updates as a single transaction. This is fine. A transaction can have a million rows in it - no problem. There is a small limit on the number of transactions per second. The size of the transaction (how many rows are affected) has little to do with how long it takes. There is just inherently a lot of overhead in a transaction. The way that my code is done, 100% of the updates will succeed or None of them will. There is no user-written "roll-back" code needed. The DB will take care itself. SQLite is an ACID Compliant Database.

So here is my effort...have fun!

use SQL::Abstract;
use Data::Dumper;

use DBD::SQLite;
use Test::More;
use Data::Dump qw(pp);

use strict;
use warnings;

my $tablename = 'atable';
my $dbfile = 'tmp.sqlite';
if (-e $dbfile) 
{ 
   unlink $dbfile 
           or die "delete of existing $dbfile failed $!";
}    

my %attr = ( RaiseError => 1);  #auto die with error printout
my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",\%attr)
    or die "Couldn't connect to database $dbfile: " . DBI->errstr;

create_test_table( $dbh, $tablename);
print "Original Test Table Contents...\n";
dump_test_table ( $dbh, $tablename);

###  TEST 1 ###
my $WHERE = undef; # (c3 = 'c') OR (c3 = 'ccc')
db_modify_column_value(
  $dbh,
  $tablename,
  [qw/c1 c2/], # the columns to modify with below sub
  sub {
         my ($href, $colnames) = @_;
         foreach (@$colnames)
         {
            $href->{$_}= uc ($href->{$_});
         }
       },  
  $WHERE
);
print "First Test Without a Where Clause...\n";
print "Test Table after modifications to cols c1 and c2....\n";
dump_test_table ( $dbh, $tablename);

### TEST 2 ###

create_test_table( $dbh, $tablename);
print "Second Test mods to c1 and c2 WHERE (c3 = 'c') OR (c3 = 'ccc')\
+n";

$WHERE = "WHERE (c3 = 'c') OR (c3 = 'ccc')";
db_modify_column_value(
  $dbh,
  $tablename,
  [qw/c1 c2/], # the columns to modify with below sub
  sub {
         my ($href, $colnames) = @_;
         foreach (@$colnames)
         {
            $href->{$_}= uc ($href->{$_});
         }
       },  
  $WHERE
);

print "Test Table after modifications....\n";
dump_test_table ( $dbh, $tablename);


#### subs #######
##

sub db_modify_column_value{
   my ($dbh,$tablename,$col_arrayref,$exe,$WHERE)=@_;
  
   my $result= fetchrows_subset_of_table($dbh,$tablename,$WHERE);
  
   my $updateSQL = "UPDATE $tablename SET ".join("=?,",@$col_arrayref)
+."=?"." WHERE rowid =?";
   # print "$updateSQL\n";
   my $update = $dbh->prepare($updateSQL);
  
   $dbh->begin_work;
   foreach my $row_hash_ref (@$result){
      $exe->($row_hash_ref,$col_arrayref);
      $update->execute(@$row_hash_ref{@$col_arrayref},$row_hash_ref->{
+rowid});
   }  
   $dbh->commit;
}


sub create_test_table {
  my ($dbh, $tablename ) = @_;
  
  my @tabledata = (
    {'c1' => 'a', 'c2' => 'b', 'c3' => 'c'},   
    {'c1' => 'aa', 'c2' => 'bb', 'c3' => 'cc'},
    {'c1' => 'aaa', 'c2' => 'bbb', 'c3' => 'ccc'},
    # replicate a row
    {'c1' => 'A', 'c2' => 'B', 'c3' => 'c'},
  );
  
  $dbh->do("DROP TABLE IF EXISTS $tablename");
  $dbh->do("CREATE TABLE IF NOT EXISTS $tablename (c1 TEXT, c2 TEXT, c
+3 TEXT);");
  
  my $insert = $dbh->prepare("INSERT INTO $tablename (c1,c2,c3) VALUES
+ (?,?,?)");

  $dbh->begin_work;
  for my $rowref (@tabledata){
     $insert->execute(@$rowref{qw(c1 c2 c3)});   
  }
  $dbh->commit;
}

sub dump_test_table{
   my ($dbh, $tablename)=@_;
   my $result = fetchrows_subset_of_table ($dbh,$tablename);
   print "Dumping Current Values in $tablename\n";
   pp $result;
}


## fetchrows_subset_of_table()
#  returns a ref to array of hashref of subset of a table
#  specified by an SQL WHERE clause

sub fetchrows_subset_of_table {
  my ($dbh, $tablename, $WHERE) = @_;
  $WHERE //= '';
  
  my $results = $dbh->selectall_arrayref(
      "SELECT rowid, * FROM $tablename $WHERE ORDER BY rowid",
      { Slice => {} }
  );
  return $results;  # $results is a ref to an array of hash ref's
}

__END__

Original Test Table Contents...
Dumping Current Values in atable
[
  { c1 => "a", c2 => "b", c3 => "c", rowid => 1 },
  { c1 => "aa", c2 => "bb", c3 => "cc", rowid => 2 },
  { c1 => "aaa", c2 => "bbb", c3 => "ccc", rowid => 3 },
  { c1 => "A", c2 => "B", c3 => "c", rowid => 4 },
]
First Test Without a Where Clause...
Test Table after modifications to cols c1 and c2....
Dumping Current Values in atable
[
  { c1 => "A", c2 => "B", c3 => "c", rowid => 1 },
  { c1 => "AA", c2 => "BB", c3 => "cc", rowid => 2 },
  { c1 => "AAA", c2 => "BBB", c3 => "ccc", rowid => 3 },
  { c1 => "A", c2 => "B", c3 => "c", rowid => 4 },
]
Second Test mods to c1 and c2 WHERE (c3 = 'c') OR (c3 = 'ccc')
Test Table after modifications....
Dumping Current Values in atable
[
  { c1 => "A", c2 => "B", c3 => "c", rowid => 1 },
  { c1 => "aa", c2 => "bb", c3 => "cc", rowid => 2 },
  { c1 => "AAA", c2 => "BBB", c3 => "ccc", rowid => 3 },
  { c1 => "A", c2 => "B", c3 => "c", rowid => 4 },
]
[download]

Update: Of course, you aren't really doing just upper case on a row, else this standard SQL would work:
UPDATE $tablename SET c1=upper(c1) which would update column c1 in all rows. I mention it because there is another option, which is to define a user function that works like "upper()". That is possible even with SQLite. However, this is a lot of hassle and I haven't actually done that in production. The performance is typically just not worth it. But just a mention.

Comment on Re: SQL: Update column(s) value with extra WHERE Select or Download Code

Replies are listed 'Best First'.
Re^2: SQL: Update column(s) value with extra WHERE by karlgoethebier (Abbot) on Jul 17, 2023 at 08:20 UTC
«…define a user function…» Butter with the fish: Create Or Redefine SQL Functions Writing a Custom SQLite Function (in C) - Part 1 Writing a Custom SQLite Function (in C) - Part 2 $dbh->sqlite_create_function(-$name,-$argc,-$code_ref,-$flags-) of DBD::SQLite «The Crux of the Biscuit is the Apostrophe»	[reply]
Re^3: SQL: Update column(s) value with extra WHERE by Marshall (Canon) on Jul 17, 2023 at 09:40 UTC
Thanks! That is a good compilation of links. I use one application written in Ruby that uses MySQL. I have considered porting it to Perl and SQLite. I show the .sql file for the Jaro-Winkler algorithm below. Jaro-Winkler is similar to Levenshtein distance but it is optimized for short strings. Like maybe comparing license plates, radio callsigns, or something similar. There is a pure Perl algorithm implementation. But a C implementation built into the DB's work would run much faster. If I ever get a C program going, I could also make an XS Perl function. Jaro-Winkler.sql Read more... (4 kB)	[reply] [d/l]
Re^4: SQL: Update column(s) value with extra WHERE by karlgoethebier (Abbot) on Jul 17, 2023 at 10:44 UTC
From Text::JaroWinkler: Read more... (13 kB) «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l]