with.a.twist has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks!

Code first (code works great, but...):

open(SRCFILE, $infile) or die print "Could not open $infile\n"; while($line=<SRCFILE>) { ##### prepare line data $groupnumber=substr($line,863,9); # extract group number $groupnumber =~ s/^\s+//; # Trim leading spaces $groupnumber =~ s/\s+$//; # Trim trailing spaces ##### execute prepared sql statment with linedata parameter $sth->execute($groupnumber) or die print "Unable to execute: $DBI::err +str\n"; ##### fetch recordset if ( $resultset = $sth->fetchrow_array()) { # # groupnumber is in the database (move to the next line) # } else { # else there was no record in the database matching this query. # we'll place this record into the database now. print localtime() . " $groupnumber being inserted into database\n"; $dbh->do("insert into part_group(part_group) Values ($groupnumber)") or die print "Could not insert $groupnumber into part)group: $DBI::er +rstr\n"; } } close(SRCFILE);
I'm checking to see if data exists in a table. If it doesn't, I'm adding it. My SRCFILE has at least 8000 records in it. Is there a faster/better way to do this? Thanks so much.

Replies are listed 'Best First'.
Re: DBI - need for speed
by VSarkiss (Monsignor) on May 13, 2002 at 21:15 UTC

    The more I think about this, the more convinced I become that you'll be better off doing it all (or mostly) in SQL. First prepare a file containing just the group numbers, and load that into a table in your database (I'm presuming you have the permissions to do that). Then delete all the matching rows:

    DELETE my_new_table WHERE exists ( SELECT 1 from part_group WHERE part_group.part_group = my_new_table.part_group )
    Then just insert them in a second statement:
    INSERT part_group(part_group) SELECT part_group FROM my_new_table
    I know it's not a Perl solution, but it's probably the fastest way (given many unwarraned assumptions about your data model, amount of data, etc. ;-)

Re: DBI - need for speed
by Elian (Parson) on May 13, 2002 at 21:21 UTC
    Do it all in SQL. You'll avoid at least one round-trip conversation with the DB server for each record. Even with a fully local database that's a fair amount of time savings, and if your database is on another machine it can make things enourmously faster.

    You also might want to check the utilities that come with your database. Many have tools to do things just like this (as your code's not doing anything complex) and they'll be about as fast as you can possibly get, since they can cheat.

Re: DBI - need for speed
by techwolf (Novice) on May 13, 2002 at 20:34 UTC
    You'd probably see a faster result by selecting a unique value in 1 SQL query for each record (if groupnumber is unique per each record, select that) store it into a hash:

    $r{$groupnumber} = 1;

    Then test the hash for the value as opposed to a single query for each record.

    if ($r{$groupnum}) { next; } else { ## Add record }

    You'll do one query at the start of your program and save the time for each query.

    -techwolf

      I tried this method, but found a small hindrance. Every time I add a record; I have to "re-populate" my hash. The possibility of having duplicate group numbers in my SRCFILE is high. This is minimal though. I can reduce the number of DB hits significantly by using this method.

      On another note. I worked with one of our DBA's and loaded these records directly (and manually) into the database. We found "sub-second" response times (and that's FAST!).

      Once I tried to automate the DB load with a korn script it brought me back up to the same performace as my original perl script. I think I'll stick with the
      query-hash-evaluate-insert-query-hash method.

      Thanks for everyones help on this.

Re: DBI - need for speed
by graff (Chancellor) on May 13, 2002 at 21:31 UTC
    If a large percentage of the 8000 or so records in SRCFILE are going to end up as new data to be inserted into the table, then you will save A LOT of time by moving those records into a separate plain text file that can be "bulk-loaded" into the table by whatever means is provided in your particulat DBMS (e.g. "sqlload" in Oracle, "mysqlimport" or some such). Single-record-at-a-time inserts with DBI are painfully slow compared to a native DBMS utility that is already optimized for inserting/appending large quantities of data into tables.

    Follow the advice given in the first reply -- do one query at the outset to find out what records are already in the table -- then open SRCFILE for input, open some other file for output, and for each line of input, write a suitable data line to output if the corresponding record needs to be added to the table. After output file is fully written, run the bulk-loader utility of choice for DBMS to add that data to the table (this can be done from within the same perl script that writes the file).

Re: DBI - need for speed
by with.a.twist (Novice) on May 14, 2002 at 00:41 UTC
    Excelent ideas!

    I'll dig up one of our Oracle DBA's and see if he's got a trick up his sleeve for the initial run. I failed to mention in my original post that after the first few runs of 8000 the load should reduce to about 3000 records and about 100 inserts per day (data maintenance).

    Either way, I'll benchmark this perl method and see if it will fit in our service levels. If it misses the mark then I'll conjure up some SQL scripts and load it directly.

    Thanks again for your ideas.

Re: DBI - need for speed
by thor (Priest) on May 14, 2002 at 01:25 UTC
    I don't have much experience with Oracle, but in Sybase, I would create a unique index on the table (a primary key would work too, I think), bulk copy the data in, letting it fail where it will. Then, if you don't want the index, drop it. Again, I don't know how much of this can be implemented in Oracle, so YMMV.