Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am in the process of writing a statistics gathering daemon for a number of systems, and I have some concerns about my approach. Basically, each daemon starts, connects to a central postgresql database and grabs info about what data it is supposed to gather and at what interval, then creates one thread per datasource. Each of the threads gathers its data, sleeps for it's particular duration, gathers data, sleeps and so on. The main reason for this was to be able to gather data from sources at completely different intervals..some every 2 seconds, some every 20 min, etc.

My concerns stem from the fact that with each thread, the memory usage increases by ~2MB, so for 7 datasource threads plus the main thread, I use up around 16MB of memory. I don't consider this to be wrong(is it?), but it does seem a bit inefficient. I originally hoped that each datasource thread would be called, then finish, rather than sitting on the memory perpetually. However, I could not get multiple "timers" to run independently of one another without either forking or threading them off from the main. Of course there may be a way, but I don't know it. :)

So, the questions are: Is there a way to run independent timers without one persistent thread for each? Even if there was, would the memory they use just stay allocated to the main process and not get released back to the OS anyway? Is there a better way, generally speaking, of handling a threading situation like this?

Thanks very much in advance. Hope it wasn't all too vague.

mike

  • Comment on multiple threads with different "timers"

Replies are listed 'Best First'.
Re: multiple threads with different "timers"
by BrowserUk (Patriarch) on Nov 21, 2002 at 17:46 UTC

    2MB per thread? Ouch. So much for 'lightweight processes'.

    You could have one thread run on a timer set to the minimum granularity for your needs and then two or three "worker" threads. The timer thread would have a table identifying the tasks to be performed. When a task needs doing, it dispatches the information required to do the task to one of the worker threads and then retrieves and maintains the information gathered. The number of worker threads would be determined by your needs for timeliness of dispatch and the likelihood of task timings clashing. You could set this up so that you start with a single worker and spawn another when the need arises. You should fairly rapidly see how many you actually need.

    Generally it better to pool threads this way than to have themm sitting around idle most of the time. Its also inefficient to destroy threads if you later need to re-create them.

    Caveat: I haven't made any use of Perl's threads yet. These are general guidelines from other threaded systems I have used.


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: multiple threads with different "timers"
by traveler (Parson) on Nov 21, 2002 at 17:54 UTC
    Is there a way to run independent timers without one persistent thread for each?

    What I do in a somewhat similar situation is to have a timer interrupt every 1 or 2 seconds. The handler for that event (or signal) processes a list looking for work to do. In your case the list might be a hash with period and function entries where period would be how often to start the thread (when time % period is equal to 0, possibly taking into account the potential for missed events) and where function is the function to start the thread. Presumably you could have more entries in the hash including args to the function.

    In my application in addition to period I put in the absolute time to start the function and use that instead of the period directly. Then if I miss a tick the start time is earlier than the clock so the routine needs to be started. I update the start time entry (by adding the period to the start time or current time) before actually calling the function.

    HTH, --traveler

Re: multiple threads with different "timers"
by pg (Canon) on Nov 21, 2002 at 19:22 UTC
    Maybe we have to look at the overall architecture. I don't know whether you have influence on the other side of the application. If yes, it probably would be better to let the other side to inform you when it is the time to query. This can be done thru UDP, or something similar. Then, on this side, you only needs two threads, one to monitor socket (thru fcntl, let the system tell you when you have some packets to read, not sleep/awake...), the other one to really do the collection.

    It is never a good idea to use thread together with sleep. Performance goes down quite a lot.

    Unfortunately, alarm is not a choice, system does not stack alarms, and only the last one take effects. Internally, the system has an single itimer, and it would be reset everytime you call alarm. Also the signal probably will be delivered to the process, not the thread.
Re: multiple threads with different "timers"
by petral (Curate) on Nov 22, 2002 at 01:37 UTC
          I could not get multiple "timers" to run independently of one another

    Sys::AlarmCall is supposed to do that.  It's old but it's basically just doing a lot of bookkeeping for you.

      p
Re: multiple threads with different "timers"
by adrianh (Chancellor) on Nov 22, 2002 at 00:08 UTC

    How long does the actual fetch of data take?

    If it's short compared to the wait times then it might be a better idea to use an event-based model rather than threads.

    For example, you could use POE to fire off each timer as it comes due.

Re: multiple threads with different "timers"
by Anonymous Monk on Nov 22, 2002 at 03:24 UTC
    It is I, the poster. It's been a while since I visited perlmonks, and I must have forgotten. I have never gotten a set of more helpful, relevant and informative replies to a posting anywhere. MEGA kudos to all of you.

    Anyway, I rewrote the core to work based on one big timer, with only a few threads. Basically, I build a hash of unique keys representing each "plugin", with the value set to the next execution time. I run a time checking subroutine to increment the first execution time by 1+ seconds as necessary to prevent hitting two execution calls in any one second. This might slow the first execution by a few seconds, but the intervals are proper after that. The execution time for each "plugin" (I really need to decide on some words here, heh) increments each time it is run.

    Next, since I found that having several calls scheduled for only a few seconds apart each made it quite difficult to not have simultaneous calls despite the check, so I added a "freshness" check, split off two worker threads, divided the tasks into organizational classes and divided the load between threads and everything is working perfectly.

    My cpu time is actually lower than before and I run steady at about 4-6MB of mem regardless of how many scheduled tasks I have running. This way I can also kill off a thread if the data gathering hangs or anything gets weird, without losing the whole daemon.

    I love the idea of the UDP triggers, but I was really trying to avoid having anything listening on the network, I'll probably use that later. I'm also researching POE and Sys::AlarmCall, they seem very relevant. I need to memorize CPAN, heh.

    Thanks again for all the great info, you guys rock :)

    mike