That would indeed take very long, but reading random lines (or heaven forbid single words!) from a 4 GB file would most probably take even longer.

And you don't have to give each record an unique random number, some collisions are acceptable and would not harm the randomness. Say you have 150 million items, then a random number of max. 10 million would lead to 15 items with the same number on the average, but these 15 "same" numbered items would come from randomly different places in your database, so that would not hurt and there is no need to check whether that number is already in use.

Keeping a list of all positions of your 150 million items somewhere in an array (which at 24 bytes per item plus the number of bytes to store each value, would flood all but the largest computers) would slow your computer down to a crawl.

The concept of "slow" is relative: even something "slow" can be fast if all other options are even slower!

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law


In reply to Re^3: Randomizing Big Files by CountZero
in thread Randomizing Big Files by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.