vit has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
Is there a good solution in ActiveState perl for the following process.
I am reading and processing a huge file and recording results to another file which takes hundreds of hours. I want to run this task in multithreads.
  • Comment on Using fork for reading and processing a large file in ActiveState perl

Replies are listed 'Best First'.
Re: Using fork for reading and processing a large file in ActiveState perl
by almut (Canon) on Feb 24, 2010 at 18:03 UTC

    When trying to speed up some program, the first step usually is to figure out what exactly is slow by profiling it (e.g. Devel::NYTProf). Only then you can take appropriate measures. For example, if the bottleneck is mainly IO (reading/writing files), multithreading is unlikely to be of much help.

      IO is very small compared to other processing which includes an http request and processing the web page HTML.
Re: Using fork for reading and processing a large file in ActiveState perl
by BrowserUk (Patriarch) on Feb 24, 2010 at 18:14 UTC

    How big is "huge"?

    What are doing between reading the data in, and writing it out?

    (I assume you must be doing something complex, because on my very ordinary system with a so-so disk, Perl can read & write 3GB/minute. Which would make your file at least 18 Terabytes.)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      OK, let's put it this way.
      I want to read a small file into array once. Then in the loop I do http requests, process results and write a portion into the file. Each portion is small.

        So why say it a huge file? And what is stopping you?

Re: Using fork for reading and processing a large file in ActiveState perl
by zentara (Cardinal) on Feb 25, 2010 at 14:04 UTC
      Dude, read the other posts before replying, you're answer is completely wrong for the OP's problem.

        Same is true for 6 of his last 10 posts.

        Dude, read the other posts before replying,

        I did, no one mentioned to him how to bring in his huge file, and effectively split it, in order to hand them off to his threads for the parallel-processing. The OP asked I am reading and processing a huge file and recording results to another file which takes hundreds of hours. I want to run this task in multithreads..

        How is it wrong to show how to get his input file split into bite sized chunks for his threads? I question whether you understand what needs to be done in an actual program. Maybe you didn't actually look at the link I provided? I showed him the various ways to achieve the first step needed for his code. See How to break up a long running process for some parallel processing usage.


        I'm not really a human, but I play one on earth.
        Old Perl Programmer Haiku