For your specific problem I wouldn't use threads either.
You can imagine the problems this will cause of you try to run two or more threads concurrently accessing the same DB from the same process.
Unless both the client program and the vendor libraries are using the same underlying (C runtime) memory management libraries--and those memory management routines are thread-safe--then memory leaks can occur.
It is easy to see how, with the client program allocating heap memory to hold the data it is giving to the DB and the DB vendor libraries freeing that memory once they've dispatch that data to the DB via a socket or pipe, that unless both the client program and vendor libraries are built against exactly the same version of the underlying C runtime, problems can result.
Eg. If the client libraries are statically linked to (say) GCC CRT v2.9x but your client program (perl) is statically linked against GCC CRT v3.x, then problems can arise. The same thing with MSVCRT7 versus. MSVCRT8 for example.
Think about what is happening at the DB server when you have multiple clients doing concurrent inserts or updates to the same tables and indexes. Regardless of what mechanisms the DB uses for locking or synchronisation, there is bound to be contention between the work being done by the server threads on behalf of those concurrent clients.
And if you have indexes and foreign keys etc. then those contentions compound exponentially. Add transactions into the mix and things get much slower very fast.
For mass updates, using the vendors bulk insertion tool from a single process, preferably on the same box as the DB and via named pipes rather than sockets if that is available, will always win hands down, over trying to multiprocess the same updates. Always.
For best speed, lock the table for the duration of the insert. If possible, drop all the indexes, perform the insertion and then re-build them.
If dropping the indexes is not possible (as you've mentioned elsewhere), then consider inserting the data into a non-indexed auxiliary table first, and then using an SQL query that runs wholly internal to the DB to perform the updates to the main table, based on data from that auxiliary table. Again, locking both first for the duration.
Finally, bend the ear of, or employ, a good (means expensive) DBA to set up your bulk insertions and updates processing for you. A few days of a good DBAs time in setting you up properly, can save you money over and over and over. There is no substitute for the skills of a good DBA. Pick one with at least 5 years of experience of the specific RDBMS you are using. More than most other programming fields, vendor specific knowledge is of prime importance for a DBA.
In reply to Re^3: When to use forks, when to use threads ...?
by BrowserUk
in thread When to use forks, when to use threads ...?
by Krambambuli
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |