Re^3: processing slows way down

Assuming you have good reasons not to use native sqlload and want to use Perl....

Are you using prepare_cached and bind values?
Are you sure you are not hitting SWAP secondary to running out of RAM? It sounds like it on the surface as this is a very common reason for scaling based slowdown.
Is it due to indexing time? ie should you delete the indexes, load and then restore them. Indexing takes time so it makes sense to do it just the once at the end. If you are doing an update of course you want an index on that where field....

Your load code should look roughly like:

my $sth_i = $dbh->prepare_cached( "INSERT INTO foo (bar, baz,qux) VALU
+ES(?,?,?)" );
# or
my $sth_u = $dbh->prepare_cached( "UPDATE foo SET bar = ?, baz = ? WHE
+RE qux = ?" );

# make it easy on perl ie declare loop vars outside loop to save creat
+e/destroy
# yes it should be optimised but it seems to help speed/memory consump
+tion
my @fields;

# delete irrelevant indexes here before we start the load

open IN, $infile or die $!;
while(<IN>) {
    @field = split "\t";
    # fail fast if you don't want to do anything - even before you cho
+mp
    # lots of loops so every  little bit of saving helps
    $sth_i->execute(@fields[0..2]);
}

# recreate indexes so you are not wasting time doing partial indexing
[download]

Native C based loaders are always faster in my experience. You will have to experiment with the best commit frequency, possibly just one commit but memory may become an issue. With big loops it goes withous saying that optimising everything within the loop reaps benefits. 5 msec times 200K is a long time. I would stongly recommend posting your code. It may be you are making some very basic errors, at the very least there are probably optimisations to be had. I can load well over a million recs an hour into MySQL using pure Perl so you load rate sounds atrocious.....

cheers

tachyon

Comment on Re^3: processing slows way down Download Code