Re: Threads memory consumption is infinite
by BrowserUk (Patriarch) on Jun 10, 2008 at 15:22 UTC
|
Based purely upon your description, it sounds like you mysql threads are failing to get cleaned up.
In general, using multiple threads with DBI has been a no-no, because the third-party (vendor) libraries that underly many DBI/DBD implementations are not thread-safe and allocate resources on a per-process basis.
You can in most cases successfully use DBI from a multi-threaded app, but the safest way to do so, is to start a single, long-running thread that conducts all the interactions with the DB. When other threads in your application need to make DB calls, they should communicate their requirements to, and retrieve results from the single DBI thread via queues or other shared-memory constructs.
The best way to test this hypothesis would be to run a copy of your app that does everything that it does now, including spawning and ending the "DBI" threads, but comment out/remove the actual DBI code. If the app runs without accumulating dead threads once they are no longer actually making DBI connections, it's a fairly clear indication that DBI, or the underlying vendor libraries are failing to clean up and releae resources.
Beyond that test, the best approach to solving the problem is to create a vastly cut down version of your app that 'goes through the motions', without actually doing to much--reducing the code to the bare minimum that demonstrates the problem, and then post that here so that here so that we can advise further. It may also allow the raising of a bug report that might allow someone to see and fix a problem.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
I commented out the DBI thread creation part as recommended and the problem persists. I also explicitly undefed 2 rather big hashes when i was done with them.
it may be doing a little better but not enough that it's a 'solution' per se. it just took longer to die if there was any change at all
| [reply] |
|
|
| [reply] |
Re: Threads memory consumption is infinite
by zentara (Cardinal) on Jun 10, 2008 at 16:31 UTC
|
I'm just brainstorming here, from other threads experience. A thread gets a copy of the parent when it gets spawned, and this is the cause of all sorts of thread safety difficulties,
Now just glancing at your pseudocode, you are first creating up to 20 parser threads BEFORE you spawn the $mysql_thread, so the $mysql_thread gets a copy of all of it. Possibly you are getting recursion in parser thread creation also? What sequence would add up to 170? Does the second parser thread get a copy of the first, etc.
Maybe try to spawn your $mysql_thread BEFORE you create your 20 parser threads? Also can you do
$mysql_thread->kill(’SIGUSR1’);
undef $mysql_thread;
possibly $threads->exit can be used in the thread to ensure it returns, so it can close itself up.
But like BrowserUk suggests, your best bet is to simplify it down to a testable example, without mysql involved, and see how it behaves.
| [reply] [d/l] |
|
|
watching task manager i see that the mysql threads do successfully die and there are never more than 25 threads or so created (20 parsers, 1 main, and 2-4 mysql which fluctuate depending on performance)
i told it to die after about 15 minutes (all that it can last) and as each thread exits it frees up 300-400MB approximately which tells me that each parser thread is accumulating the memory, and not the mysql threads or even the main thread that spawns the parsers... hopefully that was a coherent thought
| [reply] |
|
|
hopefully that was a coherent thought heh heh :-)
| [reply] |
|
|
|
|
Re: Threads memory consumption is infinite
by perrin (Chancellor) on Jun 10, 2008 at 17:31 UTC
|
Are you using WWW::Mechanize? It keeps the full text of all pages it hits in memory. Instructions for disabling this are in the FAQ. | [reply] |
|
|
im using html tree builder which has a delete function that i've implemented
| [reply] |
Re: Threads memory consumption is infinite
by Godsrock37 (Sexton) on Jun 10, 2008 at 18:14 UTC
|
i think i've deduced that its in those 20 static threads... but im having trouble testing it
i need to be able to do something along the following:
sub main_loop {
my $parser_thread;
while(
schedule_count()
and $hit_count < $hit_limit #could be commented out
and time() < $expiration
and ! $QUIT_NOW
and scalar threads->list(threads::running) <= 25
) {
#yield();
$parser_thread = async { process_url( next_scheduled_url() );};
}
$parser_thread->join;
return;
}
the problem is that it creates the 20 threads and then joins them and thats the end... i want it to create 20 threads or so and as one dies off it spawns a new one... any thoughts? i feel like this is really close | [reply] [d/l] |
|
|
This is starting to ring a bell, I've seen this with Tk, you need to reuse your threads, because the refcount is so complicated, that Perl won't free the thread's memory when it's undefind. What I do in Tk-with-worker-threads is create
3 reusable worker threads( you would want to create 20). My example looks complex because I use a bunch of hash names, but you can simplify it for your purposes. What I essentially do, is store the available threads in an array @ready. I shift off a thread as I need a thread to work, and when it is done working, I push it back onto the @ready array for next use. This way, only 20 threads ever get created, and only 20 max can run at a time. It works to conserve memory by reusing the threads. You only need to join the threads once, when exiting the program.So instead of spawing a new one as an old one dies off, push the dying one onto the @ready array, and shift one off for the next thread. Believe me, it works and is solid, and avoids the refcount problem.( To avoid confusion, when you get a
worker, a little popup appears that spawns an xterm, just close the xterm to start the thread running. On win32 change the cmd to something that works. I did this to give a visual indication that the thread was actually doing something).
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
quick question... how do i start/stop each thread?
in other words... so the thread finishes its job and i will immediately have another job for it. i dunno, i like the @ready strategy and reusing the threads, that seems like its exactly what i need, but i dont know how to implement it.
i looked at your code but, like you said, its a little more than i need. i dont need to share any data between threads except the queue which already works
can i send you a PM or an email or something? i love perlmonks but its a little bit of a low bandwidth form of communication
| [reply] |
|
|
Re: Threads memory consumption is infinite
by Godsrock37 (Sexton) on Jun 10, 2008 at 17:06 UTC
|
unfortunately i can't post all of the code because of company policies on the code etc. etc.
in the future i may be able to post a slim downed version though im not sure how much it would help, i've supplied everything having to do with threading except for the queue which is pretty straightforward. i think this is an applied theory issue where i just misunderstood something about threading or how to manage memory
| [reply] |
Re: Threads memory consumption is infinite
by Godsrock37 (Sexton) on Jun 12, 2008 at 18:33 UTC
|
sub main_loop {
my $parser_thread;
while(
still_running()
#and $hit_count < $hit_limit #could be commented out
and time() < $expiration
and ! $QUIT_NOW
) {
#if we have less than 23 (including mysql) then make a new one
if (scalar threads->list(threads::running) < 23 and schedule_count
+()){
$parser_thread = async {process_url( next_scheduled_url() );};
}
#cleanup dead threads
if (threads->list(threads::joinable)){
foreach $parser_thread (threads->list(threads::joinable)){
$parser_thread->join;
undef $parser_thread;
}
}
}
return;
}
sub still_running{
if (schedule_count()){return 1;}
else {
if (scalar threads->list(threads::running)){return 1;}
else {return 0;}
}
}
The leak is either non-existent or drastically reduced. Thanks for the help everyone | [reply] [d/l] |