in reply to Strategy for randomizing large files via sysseek
sort is optimized to work well with really large files. You may want to use the -S and -T parameters on sort.sort -u input.txt |\ perl -pe '$r = substr(rand(), 2); $_ = "$r\t$_"' |\ sort -n | cut -f 2- > output.txt
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Strategy for randomizing large files via sysseek
by Anonymous Monk on May 11, 2008 at 15:36 UTC |