At least those can be compressed well. Let's see... something like:
my %occurrences; while(<>) { # Extract the IP addresses my ($from_ip, $to_ip) = /...some regex.../ or next; # Losslessly compress them into an 8-byte key my $key = pack("C8", split(/\./, "$from_ip.$to_ip")); $occurrences{$key}++; } # Now convert the counts to small, easily sorted things my @data; while (my ($key, $count) = each %occurrences) { # Use network byte-order (big-endian) because it # will sort alphabetically in numeric order push @data, pack("N", $count) . $key; # To save space, shrink the hash. Note that it is # normally unsafe to modify the set of entries in # a hash while iterating over them, but deleting # the current entry returned by 'each' is explicitly # allowed (perldoc -f each). delete $occurrences{$key}; } # Now we have a simple array of (packed) counts and IP # pairs. The easiest thing to implement would then be: return (reverse sort @data)[0..$N-1]; # A trickier implementation that would avoid the superlinear # expense of sorting would be to grab a random element and # partition the array into a group of elements that are # greater than that element, and a group of those that are # smaller. As soon as the size of the group that is greater # exceeds $N, throw out the other group and just finish # creating the group of larger elements. If you're unlucky # and your "larger" group never gets up to $N elements, # start over again with a smaller pivot element. # # Repeat until you are left with a more manageable size, # and then just do the sort. while (@data > 10000) { # Assuming $N < 10000 my (@big, @small); # The first element may be "random" enough, since it's # in perl's hash order. But you really probably ought # to pick a better one, just in case. my $pivot = shift(@data); while (@data) { my $elt = shift(@data); ($elt gt $pivot) ? push(@big,$elt) : push(@small,$elt); if (@big >= $N) { while (@data) { $elt = shift(@data); push(@big, $elt) if $elt gt $pivot; } last; } } if (@big >= $N) { @data = @big; } else { # Put the pivot at the end so we don't reuse it! push @data, @big, @small, $pivot; } }
Unpacking the result is left as an exercise for the reader. I didn't test this all that much, but it seems to more or less work, and the idea is the main thing.

In reply to Re: Re: Re: *Fastest* way to print a hash sorted by value by sfink
in thread *Fastest* way to print a hash sorted by value by smellysocks

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.