but I can't see how to reduce this memory requirement to only 4 megabytes.

I said ~ 8 megabytes not 4.

In Scaling Hash Limits the OP said: "my simple hash of scalars (for duplicate detection) hits a size of over 55 million"

55 million / 8 / 1024**2 = 6.55651092529296875 MB.

He also mentions 180 million: 180e6 / 8 / 1024**2 = 21.457672119140625 MB. But that's before de-duping, the purpose of the exercise. But its possible his list contains no duplicates.

Of course, looking around I see that twitter uses 64-bit numbers for their user ids. And that it 20 digits not 12. Then again, they are only just now claiming 500 million users which is: 500e6 / 8 / 1024**2 = 59.604644775390625 MB which should be handleable by any modern machine with ease.

Of course, it is also possible that they do not use sequential numbers for their IDs, but rather the 64-bit number is a hash of some aspect of the account -- the name or similar -- in which case the idea won't work because 2**64 / 8 / 1024**3 ~= 2 billion GB.

But if that were the case, the OP probably wouldn't be talking about "12-digit numbers".

Of course, the OP also doesn't explicitly mention 'user' ids, just "ids", and given the number -- 180 million; roughly the number of twits per day currently -- these could be message ids; which probably are allocated sequentially?

Had the OP deigned to answer a couple of questions, much of the uncertainty could have been resolved.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^5: Scaling Hash Limits by BrowserUk
in thread Scaling Hash Limits by Endless

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.