in reply to Re: Re: Parallel Downloads using Parallel::ForkManager or whatever works!!!
in thread Parallel Downloads using Parallel::ForkManager or whatever works!!!

If you have never used either, then please consider this an accidental introduction to a slightly different culture.

When you ask a question online, you are asking other people to volunteer their time and energy for your sake. They are willing to do so, else they would not do it, but they tend to think that their time has value, and prefer to have it treated that way. Among other things this makes it rather irritating to find out that someone asked the same exact question in 10 different places without waiting to find out whether the first would have answered it. This means that 9 out of 10 sets of volunteers have just been asked to do useless work when they could have answered someone else's question.

While it might feel great for you to have all of these experts hastening to provide you with answers, it isn't very nice for people who don't get answers, and it isn't very nice for the experts involved. It isn't very nice for people whose questions may get passed over. And if that becomes common, then it becomes harder and harder to find experts who are willing to volunteer time and energy to produce those answers, which is really not very nice.

As for people thinking they have the only answer, that had nothing to do with what merlyn said or why you said it. You are actually likely (ironically) to get the best variety of questions if you ask a good question that people have multiple answers for, and people know the other answers that have already been given so they can choose whether or not to find one you don't already have. TIMTOWTDI, but if people think independently, they surprisingly often come up with the same answers. (Yet more evidence that asking multiple groups the same question results in useless duplication of work.)

  • Comment on Re (tilly) 3: Parallel Downloads using Parallel::ForkManager or whatever works!!!

Replies are listed 'Best First'.
Re: Re (tilly) 3: Parallel Downloads using Parallel::ForkManager or whatever works!!!
by jamesluc (Novice) on Jan 09, 2002 at 00:33 UTC
    So help me out here ... Are you saying that I should not ask the same question in more than one place? Or can I ask the question in multiple places, with the WARNING "I asked this question yesterday at abc@xyz" ... ??? re:merlyn

    I'm still young (in Perl years) So please bare with me. Assuming that I only ask a particular question in one place at a time ... What's an appropriate time to wait for an answer before asking the question elsewhere?

    I still don't get this. This only makes sense if all the people capable of answering my question monitor both locations and someone happens to forget that they already answered the question in the other forum. Why have two (these are the only two forums I'm familiar with, I'm sure there are more) forums if this is the (unwritten) rule?

    But, I feel that I am wasting your valuable time with my confusion (not obstinance :-)

    Because of your obvious online experience, I will follow your guidance in this case. Look, I don't want to be blacklisted or anything!

    Yes I am "studying" merlyn's http://www.stonehenge.com/merlyn/LinuxMag/col16.html paper on "forking parallel link checker." It seems very complicated for a newbie. I'm hoping that there is a simplier answer to my original poorly worded question!

    So far merlyn's is the only response. Has anyone used ParallelForkManager and made it work successfully with multiple downloads???

    I better get back to it. Cover me, I'm going in ;-)

    Your humble servant

    P.S. Thanks for adding the code tags.

      Pick a forum, ask, and then follow up responses with questions. For instance if merlyn's response was too much, say, "I am trying to work my way through the link that you gave, but I am getting bogged down. Is there a more minimal intro that anyone can refer me to?" If you get no responses for a day or 2, then I might suggest asking in another forum and say that you didn't get any responses, so you are asking here. Alternately you can look at your question and try to figure out why you didn't get responses. (One common situation. If people see that you asked, were answered, and you say no more, people assume that the answer helped. If it didn't, without feedback nobody will know that!)

      In short, treat your message as the start of an attempted conversation. Assume that between your initial asking and your understanding there may be a couple of rounds of discussion.

      In this case I cannot check this code because I don't have the necessary module. But try the following example based on the documentation:

      use strict; use LWP::Simple; use Parallel::ForkManager; # I assume you have reasons for a hash later..? my %urls = ( 'drudge' => 'http://www.drudgereport.com', 'rush' => 'http://www.rushlimbaugh.com/home/today.guest.html', 'yahoo' => 'http://www.yahoo.com', 'cds' => 'http://www.cdsllc.com/', ); my $pm = new Parallel::ForkManager(30); foreach my $file (sort keys %urls) { $pm->start and next; # do the fork # Start child code. my $ret = getstore($url{$file}, $file); print "File '$file' stored url '$url' with response code '$ret'\n"; # End child code. $pm->finish(); } $pm->wait_all_children;
      Note the use of strict.pm and a consistent indentation style. Those are both excellent habits that will save you a lot of grief in the long run...
        Thanks for the guidance. I understand the ongoing dialogue concept. Cool!

        The indentation style got lost in my unfamiliarity with the posting methods. Good advice that I don't always follow, but will endeavor to follow from now on. The strict.pm thing I will study. I'm not too familiar with it.Man, I've got a lot to learn.

        Anyway, I have read the documentation for Parallel::ForkManager, its pretty straight forward. I had a similar version to what you suggested working with the LWP::Simple getstore just to see if it worked. It did. However, what I’m trying to do is incorporate the ForkManager into a working metacrawler. The LWP::Simple getstore is O.K., but not preferred.

        The metacrawler, I call it “MEGA-Metacrawler” retrieves pages, based on keyword lists, from various search engines, basically stores the web pages into a hash, processes them for secondary patterns, and then spits out the processed results. I am using parts of LWP::Simple, LWP::UserAgent, HTTP::Status, HTTP::Request, and Digest::Perl::MD5. Its effective but, dog sloooooowwwww against the hundreds/thousands of pages that must be reviewed and processed.

        I have had our resident Perl Guru look at the code for efficiency, duplication etc … It seems to be O.K. in that department. The problem seems to be two-fold, the speed or lack thereof downloading one web page at a time, and the processing of one downloaded page at a time in the pattern matching routine. Thus, the need for the parallel processes.

        BTW, in the code I posted, I initially left out the “);” in the following line:

        $res = $ua->request($req, "$name.html"

        (Yeah I just realized, I need to use line numbers next time!)

        Nevertheless, if I comment out all of the “$pm->” lines of code it will download the pages in the hash. This also works in the MEGA-Metacrawler. But again its fast for 3 or 4 urls and slow for 4000 – 5000. Finally, I believe that I need to use the GET instead of the GETSTORE. So the code I have provided is a nearly exact excerpt from the metacrawler (which is nearly 1000 lines and thus not provided). I intend on plugging the Parallel routine into the MEGA-metacrawler once the final version works properly.

        I will spend the next day(s) continuing to study merlyn’s article referenced above and re-reading the ParallelForkManager documentation to gain a better understanding of it.

        Thanks again

        Your Humble Perl Initiate

        Thanks,

        Yes. But I will reread the documentation.

        My code is somewhat different from their code. Please refer to my recent post in reply to Tilly.

        Thanks Again