in reply to Approach to efficiently choose a random line from a large file
This assumes you are selecting only one line (there is a simple mod if you want more than one, say m, lines) . You step through the file one line at a time and decide whether to select the current line. If not, you move on to the next one and so on. On average, you will only have to read (N+1)n/(n+1) lines before selecting one. If your files contain 10 million records, you would, on average only read about 5 million lines before selecting one. How do I pick a random line from a file? requires a complete pass through all lines each time. Depending on your setup or requirements, the savings might be worth it.my $t = 0; # nr of lines already considered my $N = 10; # nr of lines in file do{ my $line = <>; }until(($N-$t++)*rand < 1); print $line;
|
---|