in reply to Re^2: Processing large file using threads
in thread Processing large file using threads

I still don't see how this would take more than a couple of minutes (ok, maybe a couple of hours), provided your blacklist is in memory.

Checking 21 million urls shouldn't really take all that much time, and reading them all from a file shouldn't take that long either; that's only about a Gigabyte of data. I am assuming you have significantly less hosts than urls, or you'd possibly need lot (i.e. more than 4 Gb) of system memory, with the algorithm you've outlined above.

ps: why isn't all this data in a database? provided you've already linked/split up the hosts from the urls, you can do this kind of query in a single line of SQL. and it'll probably be pretty fast too.

  • Comment on Re^3: Processing large file using threads