in reply to Re^3: unix loops in perl
in thread unix loops in perl

If each process takes up very little resources, then the overhead of creating them will cause it all to go slower.

If the bottleneck is network or memory, then extra processes won't help.

If the bottleneck is CPU, then the processes will let you use more cores, so that will help to a point. You probably don't have 30 cores, so a smaller number of them doing more work each would be better.

If you've got a healthy mix of resources, then a second process could help, allowing one to crunch numbers while the other waits for the disk and then vice versa.

Like so many things in life, one is good, a few more are convenient, but too much will kill you.

Replies are listed 'Best First'.
Re^5: unix loops in perl
by i5513 (Pilgrim) on Oct 26, 2011 at 22:21 UTC

    I really save time with pdsh in many common task in a day to day administration.

    - When you need connect to various machines (300 machines?), launch a command in that machines and get result, you won if use pdsh.

    - If you need download with wget from many websites (again 300), you won using pdsh vs loop

    And citing you:

    "If each process takes up very little resources, then the overhead of creating them will cause it all to go slower."

    Each process will be created ever you use parallel system or use an loop to create it. So no will be overhead if you have enough resources.

    Of course, as I said before:

    If you don't have enough CPU / Memory / network bandwith, parallel computation will not work better than loop ... even worse than it ...

    Regards,

      You can certainly pull from multiple places, but once you've saturated your network, more connections won't help.

      If you do the work in a loop, you only use one process. If you create new processes and have each process do a portion of the work, then you create new processes. I don't see where that gets confusing.


      When I say 'each process takes up very little', I mean when it is easy for your process to just do the job itself. You've seen the classic extreme case of "lets fork off a thread to print one character" loops right?

      It is starting to sound like we are violently agreeing.

        "If you do the work in a loop, you only use one process. If you create new processes and have each process do a portion of the work, then you create new processes. I don't see where that gets confusing."

        Ok, in original post was :
        for x in x y z ; do perl -bla 'bla bla' ; done
        I was talking about ;-)
        Some times parallelization save times, other not, it is clear. Yes, we agree, I think ..
        Regards,