I've been running 40 instances of wget at a time, with this used to monitor network activity: http://www.hageltech.com/dumeter/ This is opposed to 20 threads with your solution.
If you want to try it out for yourself, i'm loading from this url: http://api.eve-central.com/api/quicklook?typeid=24312 , with the parameter cycling through these indexes:
30222, 29266, 28260, 25948, 25861, 25713, 25709, 25606, 25605, 25604, 25603, 25601, 25600, 25599, 25598, 25597, 25596, 25595, 25594, 25593, 25592, 25591, 25590, 25589, 25561, 24700, 24357, 24348, 24289, 24285, 24123, 23933, 23893, 23680, 23604, 23602, 23561, 23559, 23533, 23527, 23525, 23421, 23272, 23234, 22938, 22932, 22930, 22918, 22900, 22542, 22317, 22249, 22234, 22231, 22214, 22211, 22205, 22203, 22194, 22177, 22175, 22139, 22099, 22049, 21810, 21809, 21808, 21640, 21626, 21625, 21615, 21567, 21522, 21520, 21517, 21465, 21460, 21256, 20478, 20477, 20476, 20250, 20238, 20236, 20226, 20214, 20138, 19929, 19923, 19810, 19660, 19500, 18655, 18639, 18581, 17366, 17232, 17219, 17218, 17217, 17215, 17214, 17213, 17204, 17203, 17202, 17200, 16525, 16479, 16449, 16443, 16439, 16437, 16383, 16381, 16379, 16375, 16367, 16359, 16303, 16299, 16297, 15508, 15331, 14292, 14284, 14274, 13320, 13286, 13237, 12709, 12597, 12551, 12548, 12544, 12538, 12537, 12533, 12532, 12449, 12372, 12371, 12354, 12344, 12257, 12225, 12217, 12068, 12066, 12058, 12056, 12054, 11741, 11740, 11739, 11738, 11737, 11736, 11735, 11733, 11732, 11648, 11646, 11644, 11642, 11489, 11486, 11484, 11483, 11482, 11481, 11478, 11475, 11399, 11357, 11355, 11349, 11347, 11345, 11337, 11331, 11325, 11323, 11315, 11307, 11305, 11303, 11301, 11299, 11297, 11295, 11293, 11291, 11289, 11287, 11285, 11283, 11215, 11132, 11129, 11101, 10998, 10886, 10876, 10840, 10836, 10688, 10678, 10629, 10246, 9944, 9850, 9808, 9784, 9668, 9660, 9646, 9632, 9608, 9580, 8905, 8787, 8749, 8531, 8489, 8481, 8477, 8335, 8291, 8173, 8171, 8131, 8103, 8025, 8023, 7707, 7667, 7585, 7579, 7539, 7373, 7371, 7369, 7293, 7251, 7219, 7167, 6999, 6719, 6715, 6633, 6569, 6527, 6525, 6491, 6489, 6487, 6441, 6328, 6296, 6268, 6244, 6212, 6175, 6173, 6159, 6131, 6073, 5869, 5867, 5865, 5846, 5747, 5745, 5723, 5643, 5601, 5599, 5527, 5441, 5439, 5401, 5341, 5339, 5321, 5243, 5241, 5221, 5217, 5175, 5093, 5089, 5053, 5049, 4871, 4791, 4787, 4609, 4535, 4477, 4475, 4473, 4435, 4031, 4029, 4025, 4013, 3987, 3977, 3953, 3941, 3937, 3887, 3829, 3826, 3812, 3810, 3808, 3806, 3766, 3723, 3717, 3715, 3709, 3699, 3697, 3663, 3653, 3651, 3645, 3643, 3640, 3606, 3596, 3586, 3576, 3566, 3554, 3552, 3540, 3530, 3528, 3496, 3467, 3465, 3293, 3242, 2603, 2587, 2583, 2579, 2545, 2537, 2529, 2516, 2514, 2512, 2510, 2508, 2506, 2488, 2486, 2476, 2464, 2454, 2444, 2363, 2295, 2293, 2291, 2289, 2203, 2193, 2183, 2173, 2108, 2103, 2048, 2046, 2032, 2018, 2005, 2004, 2003, 2002, 1998, 1986, 1977, 1973, 1968, 1963, 1959, 1957, 1956, 1955, 1947, 1875, 1832, 1830, 1826, 1824, 1822, 1820, 1818, 1816, 1814, 1810, 1557, 1551, 1547, 1447, 1445, 1403, 1319, 1317, 1294, 1264, 1254, 1246, 1244, 1201, 1195, 1185, 672, 655, 645, 641, 606, 581, 580, 578, 577, 573, 570, 569, 568, 567, 565, 564, 563, 561, 533, 530, 529, 526, 524, 523, 520, 503, 501, 499, 498, 497, 496, 488, 487, 486, 485, 483, 482, 464, 463, 459, 454, 453, 452, 451, 450, 447, 444, 443, 442, 439, 434, 399, 393, 377, 269, 267, 265, 264, 263, 262, 254, 252, 251, 249, 248, 247, 246, 245, 244, 243, 241, 240, 239, 238, 237, 236, 235, 234, 233, 232, 231, 230, 229, 228, 227, 226, 225, 224, 223, 222, 220, 219, 218, 217, 216, 215, 213, 212, 211, 210, 209, 208, 207, 206, 205, 204, 202, 201, 200, 199, 198, 197, 196, 195, 193, 192, 191, 190, 189, 188, 187, 186, 185, 184, 183, 182, 180, 179, 178, 45, 42, 41, 40, 39, 38, 37, 36, 35, 34
Regarding the preloading of URLs: The maximum amount of urls i'll need to load is ~10000. From what i can tell the overhead of pre-loading is neglible in contrast to the actual downloading itself. Plus, as it is it makes reading the code easier for me. :)
Memory use itself is not THAT much of an issue. I'm fine with taking up half a GB, what i was not fine with were other solutions that would quickly balloon to 1.5 GB. I know that the best way to handle threads is to create them at the start of the app in a begin block, but that isn't really an option here, as it's a CGI::App web application and there isn't really a way to know whether it'll actually do the downloading without actually loading the CGI::App stuff as well.
Thanks for the information and advice in either case, i'll keep them in mind. :)
| [reply] |