Re^2: Working with large amount of data

I think i have read about this big file some time ago (can't find this link right now). It was project of two guys: they start crawler which save to log file each GET response so they try to collect all domains default page (with HTTP headers). As i read about this they talk about 1Tb file... But this file mostly contain page content not IP address info.

Update: I have finded this link: http://www.dotnetdotcom.org/ but this is only 16Gb file :-( So i was wrong it's not Internet Index file.

Comment on Re^2: Working with large amount of data