DBI Inserting Duplicate Records

PerlAcolyte has asked for the wisdom of the Perl Monks concerning the following question:

The following code works fine on my local machine, but it inserts duplicate records on my shared server. I am starting with a CSV file that contains 40,000 records, of which I only want to insert 7,000--my stored procedure decides if a record gets inserted or not. The server is usually inserting about 10,000 records, 3,000 of which are duplicates. I am calling the perl script as a cgi script. This is a mystery to me. (Local machine: Vista/MS SQL 2k5/ODBC DSN, Server: MS Server 2K3/MS SQL 2K5/ODBC DSN). Thanks for your help.

use DBI;
my $dbh = DBI->connect( "$dbServer", "$dbUser", "$dbPass" ) or print "
+Could not make connection to database";

open(DAILY, "$updateDir/$fileName") || print "CAN'T OPEN DAILY";


while(<DAILY>){
    chomp;
    my ($field1,$field2,$field3,$field3,$field4,$field5,$field6,$field
+7,$field8) = split(/,/);
            
    if ($field1 == 33){
                
                my $spQuery = "exec my_sp ?,?,?,?,?,?,?";
                
                $insert33 = $dbh->prepare( $spQuery ) or die ( "Cannot
+ prepare statement: ", $dbh->errstr(), "\n" );
            
                $insert33->execute($field1,$field2,$field3,$field4,$fi
+eld5,$field6,$field7)
                or die( "Cannot execute statement: ", $insert33->errst
+r(), "\n" );
    
                $insert33->finish();
    }
    
}
close(DAILY);
[download]

code for stored proc


    @field1 AS INTEGER,
    @field2 AS DATETIME,
    @field3 AS MONEY,
    @field4 AS MONEY,
    @field5 AS MONEY,
    @field6 AS MONEY,
    @field7 AS INTEGER

AS
BEGIN

    DECLARE @check1 INT
    SET @check1 = 0
    SELECT @check1 = col1 FROM tableX
    WHERE col3 = @field1
    
    IF @check1 > 0
    BEGIN
        INSERT INTO priceHistory
        (col1,col2,col3,col4,col5,col6,col7)
        VALUES
        (@field1,@field2,@field3,@field4,@field5,@field6,@field7)
    END
END
[download]

Comment on DBI Inserting Duplicate Records Select or Download Code

Replies are listed 'Best First'.
Re: DBI Inserting Duplicate Records by graff (Chancellor) on Dec 30, 2007 at 00:38 UTC
First, a slightly off-topic suggestion: put the "prepare()" statement outside (above) the while loop -- one of the advantages of using placeholders is "prepare once, execute often", because this saves a lot of overhead on the DB server side. (This would also involve putting the "finish()" call after the loop.) Getting back on topic, is it possible that the table definition on the remote server is different from the one on your local machine? E.g. maybe your local version has a uniqueness constraint that isn't being applied on the remote server's version? Another issue, since the stored procedure controls insertion by querying for values of "col1 FROM tableX WHERE col3 = @field1", is whether this other table on the remote server contains the same data as your local version. A third thing to check is whether the input data files being used locally vs. remotely really are the same. Could the remote server be getting a "longer" version of the CSV file? (Who knows how it might have gotten "longer"... still it's worth checking to be sure.) I don't see anything in the perl code that would lead to different behaviors depending on local vs. remote (CGI?) runtime environments. The differences are most likely a matter of table definitions or data.	[reply]
Re: DBI Inserting Duplicate Records by bradcathey (Prior) on Dec 29, 2007 at 18:18 UTC
Maybe I'm missing something, but what does your stored procedure look like? —Brad "The important work of moving the world forward does not wait to be done by perfect men." George Eliot	[reply]
Re^2: DBI Inserting Duplicate Records by PerlAcolyte (Initiate) on Dec 29, 2007 at 19:16 UTC
stored proc code posted. thanks again.	[reply]
Re: DBI Inserting Duplicate Records by almut (Canon) on Dec 30, 2007 at 00:25 UTC
(Please take this with a grain of salt — I have virtually no experience with MS SQL ... I'm just replying in the hope that any suggestion might be better than none :) ...my stored procedure decides if a record gets inserted or not. How is `tableX` related to `priceHistory`, i.e. how does the result of the SELECT change, once a record has been inserted into `priceHistory`? In other words, how does `tableX` help to distinguish if inserting a certain record would produce a duplicate? Anyhow, sometimes, mysterious DB behaviour is the result of commits not happening when you think they should... I have no clear idea, though, how that would account for the difference between your local and the server machine, but maybe some different `AutoCommit` default setting of the ODBC driver, or some such. In the absence of any better ideas, I'd try to `commit` myself, or explicitly enable `AutoCommit`. Just a thought.	[reply] [d/l] [select]