This is just a simple outline of a web gizmo that could hit a lot of pages simultaneously. I am sure that the one second reponse time per page is due to web, not the Perl parsing of the data.

I haven't written a multi-process LWP web app as the normal sites that I have LWP clients for would probably be upset if I wacked 'em more than a few times per second and I try to be polite. There are also limits as to how many connections you can have open at once...I've never come close in Windows, so I don't know.

The poster asked for psuedo code, and here is one attempt with no error or time out escapes. Fork() in Windows is weird and is actually a thread instead of a separate process.

---- client (single program in this case): maintains a list of requests that it wants answers to.. (replies to those requests are required).. Maybe this is just a hash table with URL's? put first request(s) onto request queue then talk_2_server; talk_2_server { while (a request hasn't been sent or some request hasn't been answered...) { if (server has reply to a previous request) { take it off outstanding queue, and deal with it.. this action will generate additional new requests that go onto queue...maybe requests for 20 sub-pages ..be careful .. you might overload you are talking to! } while (I have a new request in queue) { send it to server}; # might want to think about a "throttle" if hitting # same website } } maybe I'm done or I need to loop and stuff more things onto request queue and talk_2_server again... --- server: I see a new request, fork child to deal with it. (Like maybe get the info from URL X). I then wait for next request. --- child: I've got some answer, so I want to send result to client. I cooperate with other children to so that I can send an "atomic" reponse on the pipe back to client via some kind of locking mechanism. Then my job's done, I die. Message format could be as simple as first line in the URL you requested...followed by some html response.

In reply to Re: How to speed up my Html parsing program? (Concurrently run Subroutines?) by Marshall
in thread How to speed up my Html parsing program? (Concurrently run Subroutines?) by BobFishel

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.