Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Help designing a threaded service

by Tommy (Chaplain)
on Jan 23, 2014 at 23:52 UTC ( #1071853=perlquestion: print w/replies, xml ) Need Help??

Tommy has asked for the wisdom of the Perl Monks concerning the following question:

I've envisioned a design for a listening service at $work that I'd like to implement, but I'm not sure how to do it right. I've iterated over it in my head, but I'm not sure if my ideas are the best approaches. I'm asking for some feedback.

First let's start with what I'd like to accomplish:

  • I'd like a daemon Perl service to listen for instructions on a unix socket or sockets (concurrency is needed)
  • The instructions will come as pairs of hostnames and system commands. After a hostname/command are passed in, the client receives the ID of its new task, and disconnects.
  • The listening service will establish a remote connection to the hostname provided, and run the command.
  • The commands are anticipated to take a long time to run, producing a steady stream of output (such as `iostat -dxk 3`)
  • Line by line, the service will stuff the output of the running command into a Thread::Queue or a file (haven't decided yet; memory could be a factor)
  • Every 3 to 4 seconds the client reconnects to the listening service and supplies the ID of its task, asking for whatever output has accumulated since it last connected
  • The service sends whatever is in its queue back to the client
  • At any time the service can terminate the long-running command based one one of two criteria:
    • Nobody has checked on the status of task ID N for $timeout seconds
    • A client connects to the service and issues a kill command while providing a task ID; the service then kills that task and ends its queue

The threading comes in now: I need the service to be able to process at least 20 connections in parallel, without making clients wait for a turn to run their command. I need a supervisor thread to monitor run times for all tasks and kill them off if they haven't been checked on in $timeout seconds.

The classic supervisor-worker thread model might not work here, which is where I'm stumped. I'd have to have three types of threads, not two: 1) the supervisor, 2) the task runners, 3) the listeners. Why not forks? I don't want to use forks, because I need each listening thread to be able to know about all running tasks via threads::shared in-memory variables (I'm not going to be using a database to keep track of running tasks).

I've recently gained a new respect for threading in Perl since my success with it in the recent DFW.pm hackathon, and I'd like to use those lessons learned in order to achieve success in this next endeavor.

What do you think?

 

Tommy
A mistake can be valuable or costly, depending on how faithfully you pursue correction

Replies are listed 'Best First'.
Re: Help designing a threaded service
by BrowserUk (Patriarch) on Jan 24, 2014 at 00:29 UTC

    What you've described sounds eminently doable. Though I'm going to question a couple of things and I'd probably make a few adjustments.

    • Why does the client need to reconnect every 3 or 4 seconds?

      Constantly re-making connections is a costly affair. That why http (web) servers went over to persistent connections.

      Why not let the client continue to listen and just send it data as it arrives from the remote system?

      Or perhaps better would be for the client to open a local UDP port and once it has communicated the hostname/command pair/and port number to the service it just sits back and monitors the UDP port to retrieve the output.

    • Why accumulate the data in a queue, rather than just a (non-shared) array or even a scalar?

      I guess this is because you envisage the client re-connecting to the servers primary thread to retrieve the output and so you need accumulate the data in a shared structure so the primary thread can get access to it.

      It think this is a bad design. You are making it so that your primary thread has to be involved in all data retrieval by every client; thus setting yourself up with a bottleneck. It also complicates the processing, by requiring shared data structures and uses more memory because shared data is duplicated.

    I think -- without being fully aware of your requirements -- that a better design would be to have the primary (listening) thread start dedicated thread(s) (one or two depending upon the answer to the above) for each client/command and accumulate the buffered output entirely local to that (those) threads.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      *Grins widely* ...

      Why you ask would I design it this way? Because I'm planning something evil: IE6 support. And here's the convoluted setup which I wanted to just keep segmented for simplicity's sake, but here we go:

      I have a catalyst web app serving a number of people. The app polls the server using jQuery XMLHTTP (AJAX) every N seconds (usually 2) with a task ID. I have scrolling text panes to show the output of the tasks.

      The problem which I must solve is how to get this streaming data from remote server C, while catalyst server B answers AJAX requests from client A. It's a polling system, where the browser sends a request to the catalyst server which then contacts the listening service for either initiation of long running SSH commands, or for content updates

      WEBSOCKETS WOULD BE BETTER, but I'm supporting IE6 and 7 here. And I also can't have users in browsers hitting the more private daemon that's doing the remote SSH commands. In this situation, the catalyst app acts as a firewall for it and only sends "trusted" commands based on sanitized user input.

      Now to clarify one thing I may have miscommunicated: each client connection does NOT go to the master thread. The sole responsibility of the master thread is to enforce timeouts on running tasks. "Listener" threads will be responsible for taking connections, looking up the running tasks in a shared hashref, opening the queue for task ID foo, spewing the update via ->dequeue() to the client, and hanging up. Then catalyst sends the updated data back to the web client in a well-formed AJAX response which gets populated into the scrolling faux terminal windows in the browser.

      The idea is that the browser window is divided into panes which form a dashboard. Each pane scrolls different server health/activity statistics (IO/CPU/Network/Memory). Above all that will be a moving line graph using google charts or the like. It's a grand scheme, but the hard part is the back end plumbing that I'm trying to create. The front-end stuff is easy.

      Tommy
      A mistake can be valuable or costly, depending on how faithfully you pursue correction
        "Listener" threads will be responsible for taking connections,

        "Listener threads". Plural? On different ports?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Help designing a threaded service
by zentara (Archbishop) on Jan 24, 2014 at 18:03 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1071853]
Approved by boftx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2022-12-05 08:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?