But as I previously mentioned, this appears to be an issue with how the lock state of the database is handled.
The write performance of SQLite will be far less than say MySQL. SQLite needs to gain an exclusive lock on the entire DB file in order to do a write. There is no table level or row level locking. So I figure that having multiple writers will bring nothing but trouble to you. Since in a transaction your rows are exclusive (don't overlap with other transactions), then a DB that can do row level locking could yield a lot higher performance.
I am working on a SQLite project and have found the O'Reilly book, "Using SQLite" by Jay A. Kreibich to be helpful. It is mainly oriented around the C interface, but there is plenty of great info for the Perl users too. There are discussions about how and why the DB can be busy - it is locking related.
Update: Oh, I'm not sure how well the DBI does with threads. The implementation may be "safe", but not high performance, i.e. it may just wind up serializing things. With MySQL, you may get higher performance with a process per writer instead of a thread. Benchmarks will tell.
| [reply] |
I'm well into Kreibich's book already. While it is a little dated and focused on C, there IS a lot of good stuff in it when it comes to how SQLite works.
Insofar as the locking is concerned, if I read Kreibich correctly, the exclusive lock is not requested until the commit is initiated. So other processes can write to the SQLite cache at the same time, but only one can commit at a time. Am I reading this wrong? Sounds like my reading topic for the bus ride home this afternoon :)
Insofar as whether there is an advantage for multiple threads is concerned, my application does a lot of inserts the first time through and afterwards updates most if not all of the same records. So my thoughts are that since there is a fair amounts of reads to identify whether to update or insert that there would be advantages to some degree of parallelization, maybe two threads versus just one.
This spurs my thought of having a two thread write procedure. The first thread would identify what records are updates and which are insert. This would be queued for the second process so that it is writing continuously. I'll have to give this a test also.
Thanks!
lbe
| [reply] |
I am also in a "learn mode" about the locking details. One important thing: When using the BEGIN DEFERRED transaction (the default), deadlocks are possible. A deadlock is not possible when using BEGIN IMMEDIATE transaction.
On Page 154, Chapter 7, paragraph 3:
"A BEGIN IMMEDIATE transaction can be started while other connections are reading from the database. Once started, no new writers will be allowed, but read-only connections can continue to access the database up until the point that the immediate transaction is forced to modify the database file. This is normally when the transaction is committed."
There is some more explanation on Page 155, "When Busy becomes Blocked". So a BEGIN IMMEDIATE transaction means: I am saying that this transaction is going to do a write and I want the DB to go into read_only mode. If I don't get a "busy", that's what happens (DB is now read_only until I finish my transaction). My changes are held in the memory cache until I say COMMIT (a cache write is not a "real" write to the disk). When I say COMMIT, first, the database will not allow any new read transactions to start. Then second, the DB will wait for all other transactions to finish (they are all read transactions). Once that happens, my writes can occur because I can have exclusive access to the DB.
I don't understand what happens if there is a mix of IMMEDIATE and DEFERRED transactions that want to do writes.
One thing to play around with is the cache_size. This can be adjusted dynamically. The default is pretty small. Some tweaking could perhaps can some performance increase. When I index my DB, I run it up to 200MB and it cuts the index time by like 60%.
| [reply] |
I see no mention of threads and forking in the DBD::SQLite documentation. Grepping the source also lists no mention of threads in the SQLite.xs file, but only within sqlite3.c itself. This indicates to me that little thought has been given to how to make DBD::SQLite and threads/fork play nice together.
You can maybe check whether the problem is inherent to fork() and/or threads by launching multiple writes as real, separate processes via exec resp. system instead of fork/threads. If you still experience the segfaults/crashes, the problem likely is with your version of DBD::SQLite or some other XS code loaded in the separate process. If the problem goes away, then the problem likely is related to fork/threads and I see no other way than to change to a different DBD if you want to keep going with fork/threads.
You haven't told us your OS so far, but if you are on Windows, threads and fork() are basically the same thing there anyway, and all problems with one system are present in the other as well.
| [reply] [d/l] [select] |
| [reply] |