in reply to Randomizing Big Files
1) make sure you have several Gbs of space free in /tmp, then
Sort is generally quite efficient at sorting large files. Or you're using an OS that doesn't have a sort utility capable of handling a 4Gb+ file, then$ perl -pe '$_ = rand(100000000). " $_"' bigfile \ | sort -k 1n | perl -pe 's/^\d+ //' > bigfile_sorted
2) randomly distribute the lines into a number of smaller files, individually randomise them, then concatenate them
perl -ne 'BEGIN { for (1..16) { open my $fh, ">tmp$_"; push @f, $fh" } print { $f[rand16]} $_' bigfile (randomise the files tmp1 .. tmp16), then $ cat tmp* > bigfile_sorted $ rm tmp*
Dave.
|
|---|