Re^4: release threads resources?

The issue was my application, before I reworte it, was chewing up about 80% of a gig of DDR memory. It is an app that connects to over 500 servers and for each server the app creates 1 thread and three socket connections (1 to the server and 2 to MySQL database). Each thread also creates pretty extensive hash tables (sometimes well over a few thousand keys). I am aware of perl's hash table memory hogging but the constant-time O(1) lookup is needed for speed of execution.

I also run a front end for this app via a webpage that I host on my apache web server on the same machine so I was worried about the resources left over to handle any http queries. In the end I rewrote the app to implement some load balancing and had the app respawn itself after ((scalar(@servers)/5)+1) and then start from where it left off. At first I tried system() but this of course did not work due to the blocking nature of system wating for a return so I used exec() to overwrite the current pid and dump the resources back to the OS. Seems to be working out nicely and each iteration only uses about 20% of the memory now.

www.perlskripts.com

Comment on Re^4: release threads resources?

Replies are listed 'Best First'.
Re^5: release threads resources? by BrowserUk (Patriarch) on Oct 14, 2005 at 15:38 UTC
An interesting application and it sounds like you have a working solution. I assume that you are connecting a subset of the 500 servers at any given time? I've generally found it better to have a few long running threads rather than a lots of short running ones. Basically, each thread is a loop that (in your case) would connect to a server, do what ot need to, disconnect then loop back and connect to the next server. To ensure each thread does as much of the work as it is capable of, you load/feed a shared queue with the server information, and each time around the loop, each thread pick off the next server to be connected. Any results can be fed back to the main thread one or more return queues. The nice things about this arrangement are: It scales nicely. Once you have it working (slowly) with a single comms thread, you just start as many more identical threads as your system and bandwidth can handle. It minimises any memory leaks that might occur as each thread persists until the work is done. With the start a new thread for each server approach, you multiply any leaks by the number of servers. The downside for your application is that any large data tables you need within your comms threads will be replicated per thread, but it would appear you are doing that anyway, and by only replicating for a few, persistant threads instead of every time you start a new one, you will save time. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]

Replies are listed 'Best First'.

Re^5: release threads resources?
by BrowserUk (Patriarch) on Oct 14, 2005 at 15:38 UTC

An interesting application and it sounds like you have a working solution.

I assume that you are connecting a subset of the 500 servers at any given time?

I've generally found it better to have a few long running threads rather than a lots of short running ones.

Basically, each thread is a loop that (in your case) would connect to a server, do what ot need to, disconnect then loop back and connect to the next server. To ensure each thread does as much of the work as it is capable of, you load/feed a shared queue with the server information, and each time around the loop, each thread pick off the next server to be connected. Any results can be fed back to the main thread one or more return queues.

The nice things about this arrangement are:

It scales nicely. Once you have it working (slowly) with a single comms thread, you just start as many more identical threads as your system and bandwidth can handle.
It minimises any memory leaks that might occur as each thread persists until the work is done. With the start a new thread for each server approach, you multiply any leaks by the number of servers.
The downside for your application is that any large data tables you need within your comms threads will be replicated per thread, but it would appear you are doing that anyway, and by only replicating for a few, persistant threads instead of every time you start a new one, you will save time.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

[reply]