Lets suppose you maintain a separate file (via make, maybe) which holds the line count for your data file. Call it data.count.
Tie::File is remarkably efficient at that sort of thing. You could avoid the auxilliary count file by calling rand @lines in the array index.use Tie::File; my $count = do '/path/to/data.count'; tie my @lines, 'Tie::File', '/path/to/data.dat'; print $lines[rand $count];
I don't see any bias for or against the fencepost lines in your proposal. There is a strong bias which favors long lines and punixhes short ones.
Utf-8 is the same as ascii in the ascii range. This code will work fine with either ascii or utf-8. You may want to open your data file first and tie to the open handle. That will let you set up the '<:utf8' mode in PerlIO.
Don't pay any attention to advice to call srand in those links. That is long superceded in perl.
After Compline,
Zaxo
In reply to Re: Approach to efficiently choose a random line from a large file
by Zaxo
in thread Approach to efficiently choose a random line from a large file
by Your Mother
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |