Re: Using simultaneous threads
by BrowserUk (Patriarch) on May 13, 2008 at 03:16 UTC
|
Running two threads performing concurrent accesses to a db via DBI is problematic. It might work for you, or it might not depending upon the design and implementation of the DBD::* driver and the vendor supplied API libraries/DLLs that it runs on top of. If they are not reentrant, or use (for example) the process-id of the calling app to coordinate access, then concurrent access from 2 or more threads of the same process can cause problems. I'm not sure about the reentrancy of the MySQL DBI/DBD/API chain.
However, if you are prepared to split the function above into 2, then you will be able to do most of what you want. The split is to run the first half of the code that queries DBs and theirs tables from the DBM in the man thread, and once you have a complete set of information, pass it into the threads (via a queue) and have them do the second half, of checking filesystem space and actually running the dump.
The basic structure of the app would be as shown in Re: Question: Fast way to validate 600K websites. Substitute
- Reading from the DB for reading from a file.
- Instead of pushing the url to the queue, push the DBname and table names preformatted as a single string for direct inclusion into the command.
If you only want two threads, only start two, but if your looking to maximise throughput, with IO bound tasks like these it is often as well to have at least two threads per CPU.
A question: Why are you reading the data in from the dump command just to write it straight out again having done nothing to it?
It would be quicker and simpler to just have mysqldump write it directly to a file. If it is just to count the bytes, then it would be easier to just query the size of the file once it has completed.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
my @signals = qw(TERM ALRM INT HUP);
for (@signals) {
$SIG{$_} = sub {
$rt->debug(qq{Caught signal: $_.\n});
kill 9, $pid;
$thread->join();
exit;
};
}
| [reply] [d/l] |
|
|
our @pids;
...
for my $db (keys %{$db_ds}) {
...
my $pid = fork();
if ($pid == 0) {
exec(...);
} else {
push(@pids, $pid);
}
}
and then call kill 9, @pids in your signal handler.
Finally, be sure to read perlthrtut -- it mentions some caveats about using signals and threads.
| [reply] [d/l] [select] |
|
|
With recent versions of threads, you can install per-thread signal handlers. Using these in conjunction with a signal handler in your main thread, you can forward process signals to your threads and have each one deal with it appropriate to it's context. In this case, killing the current child process.
This is necessarily untested code, but it should serve to demonstrate the idea:
#! perl -slw
use strict;
use threads;
use Thread::Queue;
use LWP::Simple;
our $N ||= 2;
my $Q = new Thread::Queue;
sub dbDump {
my $pid;
## Add per-thread signal handlers closing over the $pid
@SIG{ qw[TERM ALRM INT HUP] } = ( sub{ kill 9, $pid } ) x 4;
my $opts = '-q --single-transaction --complete-insert';
while( my $dbinfo = $Q->dequeue ) {
my( $dbname ) = split ' ', $dbinfo, 1;
my $outfile = $dbname . localtime() . '.dmp';
my $pid = open my $cmd,
"mysqldump $opts $dbinfo --results-file=$outfile |"
or die $!;
}
}
my @pool = map &async( \&dbDump ), 1 .. $N;
## Add main thread sig handlers to relay process signals to threads
@SIG{ qw[TERM ALRM INT HUP] } = ( sub{ $_->kill( 'TERM' ) for @pool }
+) x 4;
for my $db (keys %{$db_ds}) {
if (!@{$db_ds->{$db}} && !$opts{database}) {
### Our data stuctur says there is no native mysql tables.
### So we skip this database.
next;
}
$dbh->do(qq{use $db});
### The table list in our data structure is empty. The database
### option has been passed, so look them up.
if (!@{$db_ds->{$db}}) {
$queries{get_table_names}->execute();
while (my $rec = $queries{get_table_names}->fetchrow_hashref()
+) {
push @{$db_ds->{$db}}, $rec->{Name};
}
debug(qq{Got table list for $db\n});
}
### This block makes sure we can actually access the tables.
my @valid_tables;
for my $table (@{$db_ds->{$db}}) {
$queries{verify_table_name}->execute($table);
if ($queries{verify_table_name}->rows()) {
my ($rec) = $queries{verify_table_name}->fetchrow_array();
push @valid_tables, $rec;
} else {
debug(qq{There was a problem finding database/table $db $t
+able\n});
}
}
$Q->enqueue( join ' ', $db, @valid_tables );
}
$Q->enqueue( (undef) x $N );
$_->join for @pool;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
Re: Using simultaneous threads
by pc88mxer (Vicar) on May 13, 2008 at 03:10 UTC
|
Here's a simple solution that can use if:
- your perl program doesn't need to process the output of mysqldump as it is being generated, and
- you don't need to wait for the mysqldump commands to finish
If both of these are true, then structure your code like this:
for my $db (keys %{$db_ds}) {
...figure out which table to dump, etc...
system("$cmd > $dumpfile &");
}
The only difference will be that $cmd won't end with a pipe |. Moreover, since you are running mysqldump, you can also use the -r option to direct output to the dump file instead of redirecting standard output. Indeed, this is better since then you can use the safer, multi-argument version of system.
If you need to wait for the mysqldump commands to finish, then it is just a little bit trickier:
for my $db (keys %{$db_ds}) {
...figure out which table to dump, etc...
unless (defined(my $pid = fork)) {
die "unable to fork: $!\n";
}
if ($pid == 0) {
exec("$cmd > $dumpfile");
die "unable to exec for table $table: $!\n";
}
}
# now wait for all the children to finish
1 while (wait > 0);
| [reply] [d/l] [select] |
Re: Using simultaneous threads
by jethro (Monsignor) on May 13, 2008 at 03:25 UTC
|
Is there any reason why you shuffle the output of the mysqldump through perl? You don't seem to do anything else with the data. Why not let a shell pipe do the work?
system("$cmd > $dumpfile")==0 or die ...;
Now to start two of them simultaneous you could do a fork (which is clean and easy). Small caveat: You can easily check whether the child process finished, but to get a status/success/failure message, you would need a file or some IPC. But don't worry, some helpful monk will probably tell you about a module that already does most of this.
Just for completeness sake: There is also the possibility of letting the shell fork:
system("$cmd > $dumpfile &")==0 or die ...;
The '&' makes sure the system call returns immediately, so that you can start the second dump directly afterwards. To find out whether the child finished you could check the output of ps, but that is a dirty und unsafe hack in my view.
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Using simultaneous threads
by GrandFather (Saint) on May 13, 2008 at 03:37 UTC
|
What OS? Does the code run in a GUI?
Perl is environmentally friendly - it saves trees
| [reply] |
Re: Using simultaneous threads
by mhearse (Chaplain) on May 13, 2008 at 10:24 UTC
|
Just wanted to address some of the questions. The os is a current version of Red Hat. X11 is not installed, so no gui. For a program like this I usually run it from cron on a detached screen. I'm reading the output becuase I had code which was calculating and printing the bytes/second read and the wall time the dump has been running. I decided to use a freeware progress bar instead, so I can probably remove that block. The machine in question has 8 cpus and plenty of IO bandwidth so I believe dumping two different databases at once would speed up the backup process (which involves duping around 3 gb of data). | [reply] |