in reply to Re: Need to Improve Scraping Speed
in thread Need to Improve Scraping Speed

The code will be bundled with other scripts and sent to other machines for running from scratch. I'm looking for coding improvements that might speed it up a bit.

Thanks

Replies are listed 'Best First'.
Re^3: Need to Improve Scraping Speed
by gmargo (Hermit) on Dec 03, 2009 at 20:10 UTC

    Probably 99% or more of that 8 hours is spent waiting on the server. You can't speed it up by fiddling with the client. Of course you could parallelize some fetches - but that is expensive for your good-will information provider.

    Probably your best bet would be to package up the already-downloaded text file (compress the heck out of it) and ship that off with your code. Then a bare machine will load that file first before updating from the server.

Re^3: Need to Improve Scraping Speed
by Anonymous Monk on Dec 04, 2009 at 02:25 UTC
    I already gave you that answer after I gave you a program to scrape the data once and only once

    The solution is to scrape the data once and only once, repackage it, compress it, and host it as a few compressed files on an ftp server.

    CGI is the wrong way to distribute this data, and you shouldn't distribute this scraper program, its like distributing like a denial of service tool.