in reply to Strategy for randomizing large files via sysseek
Rewrite the files with a random number prepended or appended to each line. Run the new files through GNU sort but be sure to use the -T flag to send temp files to a place where you have some free disk preferrably on a different physical disk, (enough for about 1.5 - 2x the size of the largest file), you could send the output through a pipe where a process then removes the random number from each line.
NOTE: I recall being able to sort text files close to a GB with GNU sort, in reasonable time, but I'm not sure how it handles files much large than this.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Strategy for randomizing large files via sysseek
by BrowserUk (Patriarch) on Sep 10, 2004 at 01:49 UTC | |
by bluto (Curate) on Sep 10, 2004 at 15:45 UTC | |
by BrowserUk (Patriarch) on Sep 11, 2004 at 00:22 UTC | |
by bluto (Curate) on Sep 13, 2004 at 19:12 UTC | |
by Anonymous Monk on Sep 10, 2004 at 17:58 UTC | |
|
Re^2: Strategy for randomizing large files via sysseek
by Anonymous Monk on Sep 09, 2004 at 18:28 UTC |