Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Mass downloads. (N+5)

by tye (Sage)
on May 20, 2005 at 19:40 UTC ( #459104=note: print w/replies, xml ) Need Help??


in reply to Mass downloads.

Time how long a fetch takes, add at least a few seconds to that, and then wait at least that long before starting the next fetch. That should do a pretty good job of preventing server overload for batches of requests that only get run rarely.

Note that I mean for you to time each fetch. If the server gets bogged down, then your script should immediately notice that the previous fetch took longer and automatically compensate by waiting longer before trying the next fetch.

Thanks.

- tye        

Replies are listed 'Best First'.
Re^2: Mass downloads. (N+5)
by gaal (Parson) on May 20, 2005 at 21:46 UTC
    Hey, cool, that's like reflexive tit for tat.
      It's client side throttling. Effective when both sides play nice. If perlmonks wanted to add complexity and security, it'd have to be done on the server side. You lose some flexibility. such as, getting 10 nodes really fast regardless of the day, but at the end of the day, it was only 10 requests. Not a horrible thing, but mine is only an opinion. :)

      ----
      Give me strength for today.. I will not talk it away..
      Just for a moment.. It will burn through the clouds.. and shine down on me.

Re^2: Mass downloads. (N+5)
by BrowserUk (Patriarch) on May 21, 2005 at 17:02 UTC

    I'm not really sure how many posts I would be pulling--it is dependant upon the contents of those I pull--, but it probably be in the order of 10s of 1000s.

    At the rate of 1 every 5 or more seconds, 10,000 would take 13 hours, which given the 2 hour cutoff on my dialup account is somewhat impractical. I was hoping to get authority to run at a rather faster rate at times of low system load.

    If that is not permissible, I may have to abort the idea.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

      Why don't you ask jcwren to give you an account on perlmonk.org? There you will have all the time you need.

      HTH, Valerio

      Can't whatever program you run, download them in batch and keep a marker of what node was last successfully downloaded and saved? Heck, when it can't reach the server, have it sleep for 5 or 10 mintues. When you connect again, it'll just pickup where it left off.

      ----
      Give me strength for today.. I will not talk it away..
      Just for a moment.. It will burn through the clouds.. and shine down on me.

        Part of what I was hoping to do do was test what throughput I could get via my limited bandwidth using mutiple threads and overlapped IO.

        My intention was to download high proportion of PM and then produce a fully inverted index to it locally.

        With the concern that there was never a time here when I could run the process without having an undue impact, I got permission to do my throughput testing at the weekend at another site where their peak usage is weekdays.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://459104]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2022-08-17 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?