comment on

The fastest that script will ever be is dependent on the time it takes each file to download sequentially. If each image takes three seconds, in a world with no server latency or network bottlenecks, you cannot finish in under 5.8 days (13 images per each of 13000 listings, at 3s per image). This is because you are doing blocking requests; your script is waiting for wget to finish (in order to retrieve its output, which you never use) before moving on to the next request. So as you make a request, that request must be finished before you move on to the next.

However, if you can process several images at a time, say all thirteen from one listing before moving on to the next, you will be constrained more by network bandwidth, and less by raw throughput of an individual file. I cobbled together an example of parallel non-blocking requests in this response: Re: use LWP::Simple slows script down..

Apply those principles to your project, and you will reduce the time needed considerably. Let's say you have sufficient bandwidth to handle 13 incoming files at a time, and that instead of 3 seconds per file it now takes 6 because you've increased the load on the remote server. But instead of 3*13*13000 seconds, because you are requesting batches of 13, and waiting for them to finish before moving on, you are now looking at 6*13000, or less than a day to complete.

Even more efficient would be to just limit the total number of requests to some number that your bandwidth and the remote server can handle, and not be concerned with finishing an entire listing before moving on to the next.

Dave

In reply to Re: Quicker way to batch grab images? by davido
in thread Quicker way to batch grab images? by ultranerds

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.