Hi, Im trying to write script which will download data from webblog service. There are over 4k blogs there. I already wrote parser. Im using HTTP::Lite to perform GET. However, when i start my script, it fails after over 200 requests. I analyzed it using wireshark. Im experiencing a lot of Duplicate ACK's and retransmission though and than my client send FINACK and stops connection.
update it's important to add that i can't just rerun script - if i do so server doesn't respond at all -> my client sends 6 tcp syn segments and gets no response.
My guess is that server/proxy blocks me due to too many requests. Im performing GET in loop so there are no parallel GETs. I suppose that I need to write some "sleeping" code (that will adapt to the changing network performance and, most importantly, to server's limits) or smth like that but before I do i'd like to ask what do you think about it? Besides there is over 4K blogs, each having around 20-30 posts, so if i'd use sleep(10seconds) the whole site would be downloading for like 300-400 hours and i need results in about 3 days :)
sub get_blog{ #.. #.. #..some code $ua->add_req_header("User-Agent", "User-Agent: Mozilla/5.0 (X11; U; Li +nux i686; pl-PL; rv:1.9) Gecko/2008061015 Firefox/3.0\r") ; $ua->request($_[0]) or die "unable to get ".$_[0]; my $content = $ua->body(); $ua->reset(); # IF this is first page of blog -> parse constant elements # of blog END # parse posts and comments # recurency -> find address & get the next page }
there is no additional error handling concering GET.
Thanks

In reply to HTTP::Lite GET - too many requests? by mhnatiuk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.