in reply to Approach to efficiently choose a random line from a large file
Lets suppose you maintain a separate file (via make, maybe) which holds the line count for your data file. Call it data.count.
Tie::File is remarkably efficient at that sort of thing. You could avoid the auxilliary count file by calling rand @lines in the array index.use Tie::File; my $count = do '/path/to/data.count'; tie my @lines, 'Tie::File', '/path/to/data.dat'; print $lines[rand $count];
I don't see any bias for or against the fencepost lines in your proposal. There is a strong bias which favors long lines and punixhes short ones.
Utf-8 is the same as ascii in the ascii range. This code will work fine with either ascii or utf-8. You may want to open your data file first and tie to the open handle. That will let you set up the '<:utf8' mode in PerlIO.
Don't pay any attention to advice to call srand in those links. That is long superceded in perl.
After Compline,
Zaxo
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Approach to efficiently choose a random line from a large file
by ikegami (Patriarch) on Dec 12, 2004 at 06:09 UTC | |
by Zaxo (Archbishop) on Dec 12, 2004 at 06:54 UTC |