in reply to Strategy for randomizing large files via sysseek

sort -u input.txt |\ perl -pe '$r = substr(rand(), 2); $_ = "$r\t$_"' |\ sort -n | cut -f 2- > output.txt
sort is optimized to work well with really large files. You may want to use the -S and -T parameters on sort.

Replies are listed 'Best First'.
Re^2: Strategy for randomizing large files via sysseek
by Anonymous Monk on May 11, 2008 at 15:36 UTC
    Works well, thank you much!