If I've understood your exposition right,. I've done something similar, counting transactions between pairs of parties. My approach was using a Berkeley DB with the keys being both parties and the value being a pair of count and amount. That was slower than the all-RAM solution but had the nice advantage that it worked. I also think that this should be faster than multiple passes through your large dataset, but I don't know.

If you're doing this in Perl, I'd go for DB_File and store strings in it.

As the order of votes doesn't seem to matter in your data, you can linearly scale your processing time by splitting up your data across machines, at least if you can reduce the results you want to collect to something that's associative and commutative, like the count of items and the sum of items. Of course you will have to be careful when merging your collected data back together - you should code sanity checks that check that the count of records in each chunk is equal to the sum of reported counts, and that the sum over all chunks is equal to the total number of records.


In reply to Re: Catching Cheaters and Saving Memory by Corion
in thread Catching Cheaters and Saving Memory by hgolden

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.