in reply to Re^6: Parallel downloading under Win32?
in thread Parallel downloading under Win32?

If you want to try it out for yourself, ...

That might be of interest to me, but before I go hammering that site to death--within the bounds of my limited bandwidth--how will the owners feel about it?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^7: Parallel downloading under Win32?

Replies are listed 'Best First'.
Re^8: Parallel downloading under Win32?
by Xenofur (Monk) on Apr 30, 2009 at 12:06 UTC
    The owner's stance on that is documented here: http://eve-central.com/home/develop.html#xml

    In short, as long as you don't go completely overboard it can take a hammering. Also, the data of all these ids is only 60-80 MB anyhow. Personally i WOULD prefer making use of the market dumps, but I haven't found a way yet to get that kind of data dumped from CSV to SQL in a fast enough manner, given the restrictions above. (Although i was less knowledgable when i last tried.)

    Something i also forgot to mention, my line is 6MBit, so it takes a bit more to saturate than a 2MBit one.

      Okay, I did three runs using the list of IDs you provided (63.6 MB):

      • -T=4: 6:20 - 171 KB/s.
      • -T=8: 3:54 - 276 KB/s.

        (This is absolutely inline with my maximum throughput expectations for my connection.)

      • -T=16: 4:47 - 226 KB/s

      By no means definitive, but sufficient to give me no reason to change my mind that 2 threads per core will usually give the best throughput. You might consider lowering the number of threads you run and see if it doesn't improve your throughput also.

      One aside: If you have contact with the webmaster, you might suggest that he return a non-200 return code for unfound id's instead of returning 200 and a file containing: "Can't find that type". He explicitly asks people to not continually request non-existant data. That goal would be far easier to achieve if he did his bit by returning meaningful status codes.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Sorry for the delay, didn't find any time yesterday to do this.

        Either way, I can't reproduce your results. Here's the benchmark results on my end:
        $T=4 : 264 wallclock secs ( 3.80 usr + 2.09 sys = 5.89 CPU) -> 0.24 +MByte/sec $T=8 : 151 wallclock secs ( 5.31 usr + 1.89 sys = 7.20 CPU) -> 0.43 +MByte/sec $T=12 : 117 wallclock secs ( 6.55 usr + 2.41 sys = 8.95 CPU) -> 0.55 +MByte/sec $T=16 : 104 wallclock secs ( 7.73 usr + 2.83 sys = 10.56 CPU) -> 0.62 +MByte/sec $T=20 : 102 wallclock secs ( 8.36 usr + 2.80 sys = 11.16 CPU) -> 0.63 +MByte/sec $T=24 : 100 wallclock secs (10.11 usr + 3.00 sys = 13.11 CPU) -> 0.65 +MByte/sec $T=28 : 103 wallclock secs (11.89 usr + 3.02 sys = 14.91 CPU) -> 0.63 +MByte/sec $T=50 : 103 wallclock secs (18.89 usr + 4.30 sys = 23.19 CPU) -> 0.63 +MByte/sec
        Right now I'm thinking that the difference between us is either the power of my laptop, your router or just plain your internet connection.

        Also, thanks for the suggestion, I'll forward it. :)