in reply to Working with large amount of data

Do you need to count the number of different IPs, or also how often each of these IPs occured?

If it's the former, you can create a bit vector, each bit corresponding to one IP address. An IPv4 address has 4 bytes or 32 bites, so you need 2**32/8 = 0.5G bytes. See vec for a function that manipulates bit vectors in Perl.

You can also use out-of-memory storage such as GDBM_File, but that will probably slow down your program.

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^2: Working with large amount of data
by tilly (Archbishop) on Sep 21, 2009 at 06:50 UTC
    s/probably/definitely/

    If you need to count how often each IP occurred, I'd strongly recommend implementing a variant of merge sort where in the merging process duplicate keys means you merge records and sum the counts. I've done that before for this kind of problem and got quite reasonable performance.