in reply to Working with large amount of data

Does this question strike anyone else as a bit odd? I mean, an organization dealing with more unique IPs than the entire world has in use, and yet has only a single machine with 1GB of memory to count these?

So I'm wondering which of those aren't accurate? Do they actually have Oracle sitting around, and the poster doesn't know how to get to it? Or is it more like a million IPs? Or is this homework?

It just doesn't fit. It doesn't.

-- Randal L. Schwartz, Perl hacker

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Replies are listed 'Best First'.
Re^2: Working with large amount of data
by BrowserUk (Patriarch) on Sep 20, 2009 at 23:58 UTC

    Your point about the machine limitations of the organisation undiminished, doesn't this quote from your own reference suggest that there are ~1.6 billion IPs in use?

    Of the 2063.60 million addresses delegated to the five Regional Internet Registries, 1685.69 million have been delegated to end-users or ISPs by the RIRs,

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Working with large amount of data
by Gangabass (Vicar) on Sep 21, 2009 at 04:53 UTC

    I think i have read about this big file some time ago (can't find this link right now). It was project of two guys: they start crawler which save to log file each GET response so they try to collect all domains default page (with HTTP headers). As i read about this they talk about 1Tb file... But this file mostly contain page content not IP address info.

    Update: I have finded this link: http://www.dotnetdotcom.org/ but this is only 16Gb file :-( So i was wrong it's not Internet Index file.

Re^2: Working with large amount of data
by Marshall (Canon) on Sep 20, 2009 at 23:32 UTC
    I agree with this! I have a relatively old Win XP machine and on this "wimp machine" a user can use 2GB of the physical memory (aside from what WinXP uses).

    I would also ask what file system is being used to read this TB monster file?Update: this limit is not right: (My NTFS is limited to 2GB file size). This part is right: What really blows my mind is that this is a log file! Log file of WHAT?

      (My NTFS is limited to 2GB file size).

      Huh? NTFS has been able to handle individual files up to a gnat's todger under 16TB forever. (At least since NT4 days; and I believe since 3.51). Even FAT16 & FAT32 can handle 4GB.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        At the "end of the day", if the file size (pointers) in the OS's file system are limited to 32 bits, then you can only do what 32 bits can do. 2**32-1=4,294,967,296.

        Anyway 2GB vs 4 GB is WAY off from 1 TB! Update: Windows NTFS file pointers are NOT limited to 32 bits, even on a 32 bit systems. So the above is "right", but still "wrong".

Re^2: Working with large amount of data
by dsheroh (Monsignor) on Sep 21, 2009 at 11:15 UTC
    I'd suspect it's an interview question. A couple years back, Google and I talked about some open jobs and the (IIRC) second-round tech interview included a couple questions very similar to this one.