in reply to Randomizing Big Files
Some pseudo, untested, code:
my $L = ... number of lines ....; my $N = ... number of processes ....; # Open filehandles for each process: my @fh; for (my $i = 0; $i < $N; $i ++) { open $fh[$i], "| $process" or die; } # Initialize @A: my @A = (0) x $N; my $l = 0; # Iterate over the input. while (<$input>) { # Pick a number. my $r = rand($L - $l); # Find process. my $i = 0; while ($r >= ($L / $N - $A[$i])) { $r -= $L / $N - $A[$i]; $i ++; } # Write line. print $fh[$i] $_; # Increment array. $A[$i]++; # Increment line counter; $l++ }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Randomizing Big Files
by Anonymous Monk on Jan 26, 2005 at 15:41 UTC |