in reply to Using simultaneous threads

Running two threads performing concurrent accesses to a db via DBI is problematic. It might work for you, or it might not depending upon the design and implementation of the DBD::* driver and the vendor supplied API libraries/DLLs that it runs on top of. If they are not reentrant, or use (for example) the process-id of the calling app to coordinate access, then concurrent access from 2 or more threads of the same process can cause problems. I'm not sure about the reentrancy of the MySQL DBI/DBD/API chain.

However, if you are prepared to split the function above into 2, then you will be able to do most of what you want. The split is to run the first half of the code that queries DBs and theirs tables from the DBM in the man thread, and once you have a complete set of information, pass it into the threads (via a queue) and have them do the second half, of checking filesystem space and actually running the dump. The basic structure of the app would be as shown in Re: Question: Fast way to validate 600K websites. Substitute

  1. Reading from the DB for reading from a file.
  2. Instead of pushing the url to the queue, push the DBname and table names preformatted as a single string for direct inclusion into the command.

If you only want two threads, only start two, but if your looking to maximise throughput, with IO bound tasks like these it is often as well to have at least two threads per CPU.

A question: Why are you reading the data in from the dump command just to write it straight out again having done nothing to it?

It would be quicker and simpler to just have mysqldump write it directly to a file. If it is just to count the bytes, then it would be easier to just query the size of the file once it has completed.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Using simultaneous threads
by mhearse (Chaplain) on May 13, 2008 at 15:27 UTC
    Thanks. The post you referenced gets me started. Right off, I have one question. How do I install signal handlers for each of the threads? A situation may arise where I have to kill everything. Hopefully not the hard way.

    I'm under the impression (possibly falsely) that I need to kill the $pid for mysqldump to ensure that the thread is joinable. I'm not sure how to do that when dealing with multiple threads.

    my @signals = qw(TERM ALRM INT HUP); for (@signals) { $SIG{$_} = sub { $rt->debug(qq{Caught signal: $_.\n}); kill 9, $pid; $thread->join(); exit; }; }
      Note that calling join on a thread waits for it to complete, so to prevent blocking you'll have to find a way to ensure that your thread terminates. Also, you might have to do some experimentation to determine which thread will get the signal.

      I think you are better off using fork in this situation -it's going to be a lot simpler. Just keep track of the pids you create:

      our @pids; ... for my $db (keys %{$db_ds}) { ... my $pid = fork(); if ($pid == 0) { exec(...); } else { push(@pids, $pid); } }
      and then call kill 9, @pids in your signal handler.

      Finally, be sure to read perlthrtut -- it mentions some caveats about using signals and threads.

      With recent versions of threads, you can install per-thread signal handlers. Using these in conjunction with a signal handler in your main thread, you can forward process signals to your threads and have each one deal with it appropriate to it's context. In this case, killing the current child process.

      This is necessarily untested code, but it should serve to demonstrate the idea:

      #! perl -slw use strict; use threads; use Thread::Queue; use LWP::Simple; our $N ||= 2; my $Q = new Thread::Queue; sub dbDump { my $pid; ## Add per-thread signal handlers closing over the $pid @SIG{ qw[TERM ALRM INT HUP] } = ( sub{ kill 9, $pid } ) x 4; my $opts = '-q --single-transaction --complete-insert'; while( my $dbinfo = $Q->dequeue ) { my( $dbname ) = split ' ', $dbinfo, 1; my $outfile = $dbname . localtime() . '.dmp'; my $pid = open my $cmd, "mysqldump $opts $dbinfo --results-file=$outfile |" or die $!; } } my @pool = map &async( \&dbDump ), 1 .. $N; ## Add main thread sig handlers to relay process signals to threads @SIG{ qw[TERM ALRM INT HUP] } = ( sub{ $_->kill( 'TERM' ) for @pool } +) x 4; for my $db (keys %{$db_ds}) { if (!@{$db_ds->{$db}} && !$opts{database}) { ### Our data stuctur says there is no native mysql tables. ### So we skip this database. next; } $dbh->do(qq{use $db}); ### The table list in our data structure is empty. The database ### option has been passed, so look them up. if (!@{$db_ds->{$db}}) { $queries{get_table_names}->execute(); while (my $rec = $queries{get_table_names}->fetchrow_hashref() +) { push @{$db_ds->{$db}}, $rec->{Name}; } debug(qq{Got table list for $db\n}); } ### This block makes sure we can actually access the tables. my @valid_tables; for my $table (@{$db_ds->{$db}}) { $queries{verify_table_name}->execute($table); if ($queries{verify_table_name}->rows()) { my ($rec) = $queries{verify_table_name}->fetchrow_array(); push @valid_tables, $rec; } else { debug(qq{There was a problem finding database/table $db $t +able\n}); } } $Q->enqueue( join ' ', $db, @valid_tables ); } $Q->enqueue( (undef) x $N ); $_->join for @pool;

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Thank for the example. That does exactly what I need.