shanu_040 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
Let me first tell you the requirement for that i am looking for your help. I have to screen scrap the contents from more than 20 web sites parallely for specific query term.I am using paralle::forkManager, and It is doing the same as expected. But my concern with parallel::forkmanager is, I have to wait for all forked process to finished. That will cause a big dilay to display the fetched contents. I wanted to display the fetched contents as they are availbe from any of the sites sceen scraped first by any proccess. In nut shell I don't want to wait for all processes to finished to display the feched content. Suggest me if some other module whcih can be helpful. It would be nice to have your ideas on it.

Replies are listed 'Best First'.
Re: How to do Screen Scraping in parallel
by tilly (Archbishop) on May 22, 2009 at 20:59 UTC
    There are many solutions to this problem, but which ones are best will depend on how you are displaying the information.

    If you're doing a web page, what you should do is fork off processes and background them, then return a web page and have ajax requests return and poll for the data. (It is up to you to figure out a way to communicate between the forked processes and the web page.)

    If you're doing a GUI you can use a similar approach with polling for data. Alternately you can explore the world of asynchronous programming. Which will lead you to modules like POE. Or you can explore the complications of multi-threaded programming (which in this case is going to be similar to the fork approach, except that you've got threads rather than programs).

    If you've got a command line or a batch program, treat it like the GUI.

      Hi,
      I am using web-page. program is successfully able to do parallel screen scraping using Parallel::ForkManger. But the thing that causing problem for me is wait_for_all in PFM. My program collects the data and put them into a hash but, I do not have any idea, How can I use ajax request to display the content form that very hash at the same time. Any idea would be helpful.
      Shanu