learnedbyerror has asked for the wisdom of the Perl Monks concerning the following question:

Some time ago I submitted a question titled DBIX::Class and transactions while trying to identify the best way to handle transactions when working with DBIC, Moritz pointed me in the direction of txn_do. I have been using this approach but have not been totally happy with it when trying to have multiple threads/processes update my SQLite database.

My need/desire/want is to have multiple threads that gather information from various sources and update_or_insert it into my SQLite DB. For the purposes of this question, please consider thread(s) to mean either perl threads or fork as I get the same error regardless of parallel methodology.

My current methodology is to have a set of worker threads that pick up work from a Thread::Queue, call it q1, and gather the information as defined in the queue entry. The thread then stores the information in a reference and enqueues it on Thread::Queue q2. I then have a single thread that dequeues the reference from q2 and call the update subroutine using txn_do. The update subroutine iterates through the reference and performs multiple DBIC update_or_insert calls, anywhere from 1 - 10000, inside a single transaction. As long as I run a single update thread, this works fine. However, this single thread is the limiting factor in overall run time.

NOTE: The database rows affected by a transaction are mutually exclusive from those of any other transaction. My need for transactions are to insure that all or the rows affected are updated consistently or that none are and the issue is logged.

My problem is that any time that I try to run more than one thread, all but one of threads, die and the only error message that I get is that the thread terminated abnormally. I have tried to capture the error message; however, txn_do is recorded as having thrown the exception. The associated trace does not contain any detail about where the actually error occurred. In fact, it only points to the line at which txn_do was called and not the the line in the coderef that actually fails. Using logging, I have been able to determine that it fails when I make the DBIC call to update_or_insert.

My current opinion/assessment of the matter is that txn_do in conjunction with the SQLite drive place the database into an exclusive lock for a relatively long time period (2 - 5 secs as determined by attempting to run a select in the sqlite3 CLP when running updates) and that there is an inssuficient retry count/period to cover this. While I feel confident in my observations, I am much less confident about my conclusion and event less confident at picking a path forward.

So my question(s) is/are

  1. Have any of you seen this issue before and do you have a recommended solution?
  2. Is it feasible to maintain update concurrency with SQLite using DBIC?
  3. Would I do better to code directly in DBI and use a semaphore to ensure that only one thread is executing a commit at one time?
  4. And lastly, am I stepping all over a limitation of SQLite and should I move this to another DB like MySQL?

Thanks in advance for your assistance O Monks! I am sure that you can help me to see the light!

lbe

  • Comment on DBIx::Class and Parallel SQLite Transactions

Replies are listed 'Best First'.
Re: DBIx::Class and Parallel SQLite Transactions
by Corion (Patriarch) on Jul 14, 2011 at 12:24 UTC
    perl threads or fork

    Neither of these work with DBI without special precautions.

    SQLite doesn't really cater towards multiple writers anyway, and only in recent versions picked up good support for single-writer / multiple-reader configurations. I would think hard about if you really, really need multiple writers with an SQLite database and can't move to Postgres or some other database.

      Corion, thanks for your response.

      One of the reasons that I selected DBIx::Class as an ORM was that it detects threads/forks and manages the DBI handles approrpriately. I really don't think I want to get into writing and managing code to do this on my own.

      I am thinking hard about my continued use of SQLite. Given what I am doing, there is benefit in using a file based DB. It give me flexibility in moving the execution from machine to machine without having to manage a database move/replication. However, it does come with many limitations, concurrency and write performance being the main ones.

      This query is probably my last stop on the continued use of SQLite for this project. If I don't get any eye opening responses on how to accomplish my goals with SQLite, I will be moving to another DB, probably MySQL since I have other applications that use it.

      Thanks again,

      lbe

        DBIx::Class is easily fork/threadsafe, as it is pure Perl. But its README says

        ... [ DBIx::Class ] is fork- and thread-safe out of the box (although your DBD may not be).

        ... and I don't expect DBD::SQLite to be fork and/or threadsafe. Likely, the SQLite documentation and/or DBD::SQLite documentation even say so.

Re: DBIx::Class and Parallel SQLite Transactions
by locked_user sundialsvc4 (Abbot) on Jul 14, 2011 at 15:40 UTC

    My visceral reaction to this approach is that, since there is one resource being accessed here (and, it is being accessed “as a regular file” by software that is executing in the context of each process, then there is not likely to be any performance advantage in asking multiple threads to do the work.   The process is fundamentally I/O-bound, and the database file is a shared resource.   Realistically speaking, the operations will be serialized.   Thus, the advantages of multi-threading are voided.

    Now, when you do use SQLite, remember that it is imperative to do things within transactions, even in a single thread.   Otherwise, SQLite will (quite by design...) verify each and every write, each and every time.

      I would agree with your statement if there were only insert or update calls being made during transaction; however, there are additional cycles spent and database calls made to determine whether updates or inserts are needed. Also added to this is that the transaction contains one initial operation to set an active flag inactive for all affected rows, followed by the update/insert action. So there are opportunities to for parallel execution, at least in theory

      After mulling it over, I am currently deciding to stick with a single writer; however, I am going to refactor the update procedure to utilize a temporary table. My initial plan looks something like:

      1. Create database transaction semaphore
      2. Create update thread
      3. Create temporary table with the same structure as the destination table
      4. Copy rows of interest from the destination table to the temporary table
      5. Perform all operations on the temporary table where there will be no concurrency issues
      6. Grab database transaction semaphore
      7. Update destination table from temporary table
      8. Release database transaction semaphore
      9. Flush data in temporary table
      10. Start over for next set of data

      I believe that this approach, while more complex than my current approach, will squeeze most of if not all of the non update/insert specific processing cycles out of the actual transaction where concurrency is an issue while insuring that multiple writers will not be attempting write transaction at the same time

      I recognize that this essentially duplicates the locking performed by SQLite; however, I cannot figure out how to do to keep DBIC from either throwing an exception or to handle the exception properly because DBIC's exception returns a null error message in $@.

      Additionally, I am having problems identifying how to create the temporary table using DBIC. I have tried to build on some work documented in some early threads in here the monastery but realized little success with it yet. I'll start a new thread on that topic later today and will update this thread with its link

      Thanks to all of the Monks who have replied with input!