targetsmart has asked for the wisdom of the Perl Monks concerning the following question:

I think that I have read enough about threads in general and in specific to perl also
I wanted to try threads now for a load testing situation
my program has to read data some set of files and pump that to the connected clients(socket). Basically a simulator program.
but the condition is, reading from the file and pump into client has to be completely asynchronous and have to run in parallel.(let me give an example)
I thought of using fork. but in case I have 100 files, I must have to fork 100 times, where one process reads one file at a time and pumping data to a single(same) client. When the second client connects it would be a mass fork of again 100 processes.
I wish to use threads for this purpose (boss/worker) model, but since it uses COW method I am just reluctant to use(IMO).
I just read about coro module, which gives this parallel running mechanism and shared address space.
I need your guidance here, can I use coro for the above purpose, or any other such module available; which is almost similar to pthreads(POSIX threads)?, if available please give me directions.

Vivek
-- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.

Replies are listed 'Best First'.
Re: choosing threads
by zwon (Abbot) on Feb 19, 2009 at 11:43 UTC

    Threads are not most efficient way to do parallel IO. I'd used something like Event for such task.

    Update: thought actually I've looked at Coro and it seems it can make the thing.

Re: choosing threads
by holli (Abbot) on Feb 19, 2009 at 11:37 UTC
    Did you consider using a client/server architecute and a simple webserver/database? It's parallel per se, you can identify the client using cookies/sessions and, as an extra benefit, it app runs over the net, making remote clients easily possible.


    holli

    When you're up to your ass in alligators, it's difficult to remember that your original purpose was to drain the swamp.
Re: choosing threads
by zentara (Cardinal) on Feb 19, 2009 at 12:57 UTC
      If I understand this correctly, then he intends to send the same data to every thread. Thus it is not really the traditional Boss/Worker pattern as every worker will perform the exact same (not similar) task.
      I have had a similar requirement with realtime data and I did not manage to get this working as unfortunately there is no way to have a shared array of Queues.
      Only way would be to store the read data in memory before starting with the threads, but this means a lot of memory consumption, which will be high already with 100 threads. Update: I would be interested i there is a decent way to achieve a shared array of Queues.
        No, the data sent to each thread at the beginning of a run is different, and shifted off of an array or something. The key is for efficiency...STOP multiple spawning and/or forks...and reuse the threads, just reset them and refill with fresh data to be processed. We may be misunderstanding the fine points of what he is attempting, but reusing threads is the best for efficiency.

        I'm not really a human, but I play one on earth My Petition to the Great Cosmic Conciousness
Re: choosing threads
by BrowserUk (Patriarch) on Feb 19, 2009 at 18:58 UTC
    I thought of using fork. but in case I have 100 files, I must have to fork 100 times, where one process reads one file at a time and pumping data to a single(same) client. When the second client connects it would be a mass fork of again 100 processes.

    Maybe I'm the only one who has trouble understanding this paragraph as others have answered without quesioning it, but it's not at all clear to me why you feel you need to "fork 100 times" in order for "one process reads one file at a time and pumping data to a single(same) client"?

    Essentially, it is not clear from your post what you trying to do, but reading between the lines, it may be something like:

    1. Open a listening socket:
    2. When a client connects, select one of 100 hundred files and send it to that client; disconnect.
    3. Repeat.

    If that's an accurate picture of the requirement, then it can easily be satisfied with either fork or threads, and without the 'mass spawnings' that you seem to think are involved.

    If that is an inaccurate description of then you need to clarify the actual requirements before you will get useful answers.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      It has to be
      1. Open a listening socket:
      2. When a client connects, I need to pump the content of all the 100 hundred files to the client and disconnect.
      NOTE: when reading from files, I will read a packet(collection of events delimited by \r\n) from file 1 and pump into the socket, then read a packet from file 2 pump into the socket, it goes on for all the 100 files as long as there are packets in the file.
      It won't be always file1 and file2 , it can be file1, file100, file10, ...(asynchronous).
      This pumping should occur in parallel.
      Actually I am trying to simulate a C program which solved the above problem using threads effectively.
      3. Repeat.
      I hope that I have given some clarity of my problem.

      Vivek
      -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
        I hope that I have given some clarity of my problem.

        Clearer, but this "It won't be always file1 and file2 , it can be file1, file100, file10, ...(asynchronous)." still needs clarification.

        Why would the order you read from the files vary? (What do you mean by asynchronous? (I know what the word means:))

        Also, are you likely to have concurrently connected clients, or will it be a one client at a time affair?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: choosing threads
by gone2015 (Deacon) on Feb 19, 2009 at 18:55 UTC

    Threads will give you two things: (a) the ability to use multiple processors -- but in this case I guess you're pretty much I/O bound; (b) a way of running complicated processing separately in each thread -- with the state of the processing held implicitly in where you are in the code.

    For simply reading files and throwing them into sockets, it may be more straightforward to use non-blocking I/O and a select loop. I'd maintain a (large-ish) buffer for each socket, and each time I could write something I'd do a non-blocking file read to top up the buffer -- the assumption being that file I/O will easily outrun socket I/O (assuming they aren't intra-machine sockets).

    Mind you, I believe select only works on sockets under Windows -- but I'm damned if I can find a reference for that. So there you'd have to use a time-out or something if one of the socket buffers were ever to empty.

    Also, is 100 input files and 100 output sockets a lot of file handles ?

    I suppose you could compromise and fork 4 or 8 processes...