in reply to processing slows way down

That sounds like a scaling problem, but munching a sequence of independent records is generally linear in the size. From your description of the problem I think you are expecting that. Two possibilities come to mind: memory usage and database internals.

Are you slurping the file or holding all the records in some global data structure? Driving the machine into swap gives a big performance hit. The solution is to keep records in a small structure which goes out of scope as soon as possible.

Without knowing more details, it's hard to guess what db server operations might be a bottleneck. As a database grows, insertion becomes slower, but your numbers don't seem large enough to make that the issue. It's possible that a store of pending transactions is getting uncomfortably large. If so, try commiting your insertions more often.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: processing slows way down
by Anonymous Monk on Jul 10, 2004 at 14:47 UTC
    Thank you for responding.

    Each record from the input file is read one at a time, processed in a while until end of file and split into fields. The fields are processed if they contain data.

    Commits are done every 5,000 inserts.

    Like I said before, in the above way 30,000 input records got processed in an hour, the rest progressively got slower.

      Assuming you have good reasons not to use native sqlload and want to use Perl....

      1. Are you using prepare_cached and bind values?
      2. Are you sure you are not hitting SWAP secondary to running out of RAM? It sounds like it on the surface as this is a very common reason for scaling based slowdown.
      3. Is it due to indexing time? ie should you delete the indexes, load and then restore them. Indexing takes time so it makes sense to do it just the once at the end. If you are doing an update of course you want an index on that where field....

      Your load code should look roughly like:

      my $sth_i = $dbh->prepare_cached( "INSERT INTO foo (bar, baz,qux) VALU +ES(?,?,?)" ); # or my $sth_u = $dbh->prepare_cached( "UPDATE foo SET bar = ?, baz = ? WHE +RE qux = ?" ); # make it easy on perl ie declare loop vars outside loop to save creat +e/destroy # yes it should be optimised but it seems to help speed/memory consump +tion my @fields; # delete irrelevant indexes here before we start the load open IN, $infile or die $!; while(<IN>) { @field = split "\t"; # fail fast if you don't want to do anything - even before you cho +mp # lots of loops so every little bit of saving helps $sth_i->execute(@fields[0..2]); } # recreate indexes so you are not wasting time doing partial indexing

      Native C based loaders are always faster in my experience. You will have to experiment with the best commit frequency, possibly just one commit but memory may become an issue. With big loops it goes withous saying that optimising everything within the loop reaps benefits. 5 msec times 200K is a long time. I would stongly recommend posting your code. It may be you are making some very basic errors, at the very least there are probably optimisations to be had. I can load well over a million recs an hour into MySQL using pure Perl so you load rate sounds atrocious.....

      cheers

      tachyon