Here's a routine that should return a random
line from a file, without needing to read the
entire contents of the file into memory:
my $file = '/usr/dict/words';
#selecting a random line from a file
my $line = do {
open FH, "<$file" or die "Could not open $file: $!";
#Go to a random spot in the file
seek FH, rand((-s FH) + 1), 0;
#This forces us to the next line, because
#we might be positioned in the middle of
#a line in FH.
<FH>;
#If the next call to <FH> is about to
#return a part of the last line in the
#file, then wrap around to the beginning
#of the file. We ONLY want complete lines.
seek(FH, 0, 0) if eof;
#Return the next available, complete, line
<FH>;
};
Here's an explanation of the code:
Most similar routines will break when they
randomly select the last line of a file. This
is because they do the first <FH>
, which forces the file cursor to start at the
next line. The only problem is that if
you are already positioned somewhere inside the
*last* line, the second access of <FH> will return
undef, which is not usually what you want.
Another flaw, with the standard routine of this type,
is that we are always looking one line ahead of the line
that was randomly seek'd to. This means, that no matter
how many times you run the routine, it will *always*
miss line 1.
The above routine gets around that by simly asking
to see if we are at the end of the file, using eof. If we are, then we wrap to the beginning. This solves both the problems above, while still maintaining an acceptable level of randomness with rand.
Update: This routine does have a bias towards longer lines in the file. Do not use it for files where the fields are variable length. Use the methods described here: How do I pick a random line from a file?. |