radiantmatrix has asked for the wisdom of the Perl Monks concerning the following question:
I am working on a large DB project. Files are loaded, line-by-line, into a "cache" table, and each line is tagged with a Session ID sid corresponding to when it was created.
When the sessions are copied to the "live" tables, more recent data with the same key values must replace older data. If the data doesn't exist (the more frequent case), then it must be inserted.
I've got something that works, but it's not as fast as I'd like it to be, and the code is definately in "ugly hack" status. I'm looking for a better, cleaner, faster way to do what I need.
Update: I have, in fact, found a better way to do this that combines dragonchild's and Roy Johnson's ideas. Thanks for the help, monks!
Currently, I read a "Map" file which defines the keys for the table, among other things, and extract a list of sessions that need updating into @session. Then I use this chunk of code:
Note that I use some custom modules to simplify certain tasks. $db is a custom wrapper for DBI that logs DBI messages to a file (and some other stuff). $log is a custom logger that fits my employer's anal logging requirements.foreach (@session) { $db->{_dbh}->{PrintError}=$main::opt_verbose; #supress errors unle +ss verbose $db->prepare($select) or die("Unable to prepare SELECT"); $db->execute($_) or die("Unable to execute SELECT"); my $session = $kit->session(id=>$_, status=>'locked'); $log->say("Processing $_"); my $sth = $db->{_sth}; my @records = @{$sth->fetchall_arrayref}; SCRUB: my $inserted; my (@to_update, @skipped, @upd_error, @ins_error); $session->update($db); $log->say("Inserting rows..."); $log->indent; $db->prepare($insert) or die("Unable to prepare INSERT"); while (@records) { my @row = @{shift @records}; if ($db->execute(@row)) { print STDERR "."; #if ($main::opt_verbose); $inserted++; } else { print STDERR "o"; #if ($main::opt_verbose); push @to_update, \@row; push @ins_error, $db->errstr; chomp $ins_error[$#ins_error]; } }#-while : done inserting #Process to_update rows print STDERR "\n"; $log->unindent; $log->say("Inserted $inserted rows, now updating $#to_update r +ows."); $log->indent; $db->prepare($update); if (@{$map{options}{key}}) { foreach (@to_update) { #my $row = $_{row}; #never mind the errstr, for now. my @upd_params = @{$_}[0 .. $#{$map{options}{key}}]; if ($db->execute(@{$_}, @upd_params)) { print STDERR "."; # if ($main::opt_verbose); } else { print STDERR "o"; # if ($main::opt_verbose); push @skipped, $_; push @upd_error, $db->errstr; chomp $upd_error[$#upd_error]; } }#-for } else { @skipped = @to_update; for (@ins_error) { push @upd_error, ""; } #define @upd_err +or, to avoid later issues. }#-if $log->unindent; #Clean up if (@skipped) { open SKIPF, '>>', "session.skip" or $log->warn("Unable to +record skipped lines to file"); print SKIPF "\n===> $_\n"; while (@skipped) { print SKIPF join(chr(9), shift @ins_error, shift @upd_ +error, @{shift @skipped})."\n"; } close SKIPF; $log->warn("$#skipped records thrown to skip file"); } else { push @skipped, 'none'; #allows $#skipped to be 0 } $log->say("Finished with $_, $inserted new rows, ".$#to_update +-$#skipped." updated."); #-SCRUB
Obviously, there are items here which only come into play later in the script -- I've left them in out of a concern that I might accidentally trim too much.
Essentially, I select all the records into an array-of-arrays and try to insert them. If that operation fails (DBI raises an error, which causes my custom $db->execute to return undefined), I add that row to the @to_update array.
I then make a pass on the @to_update array, attempting to use the database keys to execute an UPDATE on the correct rows. Any that fail that operation are added to @skipped; those are later recorded to a SKIP file for later analysis.
Obviously, since the data is being loaded by my script, then we try to send it back (sometimes twice), this can be extremely slow. Stored Procedures are explicitly excluded by the project requirements. My supervisor thinks this performs acceptably, but I'm trying to learn to be an efficient coder.
The Wisdom of the Monks is requested to find a Better Way.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Superior way to update SQL tables
by Roy Johnson (Monsignor) on Sep 08, 2004 at 14:29 UTC | |
by radiantmatrix (Parson) on Sep 08, 2004 at 15:21 UTC | |
by Roy Johnson (Monsignor) on Sep 08, 2004 at 16:16 UTC | |
|
Re: Superior way to update SQL tables
by dragonchild (Archbishop) on Sep 08, 2004 at 14:30 UTC | |
by radiantmatrix (Parson) on Sep 08, 2004 at 15:28 UTC | |
by dragonchild (Archbishop) on Sep 08, 2004 at 15:49 UTC |