in reply to Re: RandomFile
in thread RandomFile

I went with the first option. I was glad you gave me choices ;). The first one was exactly the behaviour I was looking for. You seen to dislike this option but I fail to see why. Here's the code I used, based on all the help in this thread:
use strict; my $dict = "/usr/dict/words"; my ($set, $cur, $end) = (0, 1, 2); for my $i (1..30) { printf "word #%2d: \%s\n", $i, &pick_word(10); } sub pick_word { my $reqsize = shift; my $ret = ""; my $die = 0; open DICT, "$dict" or die "redmist code: $!"; seek DICT, rand(-s $dict), $set; <DICT>; # this get's us to the start of the next line. TOP: while (<DICT>) { ($ret = $1 and last) if /^(\w{$reqsize})$/; } if(length($ret) != $reqsize) { # ok, go back around seek DICT, 0, $set; # but remember we were already here ... die "Word too big, aaaaaaaahhhhhh!" if $die; $die = 1; goto TOP; } close DICT; return $ret; }

Replies are listed 'Best First'.
Re: Random Line from a file
by dkubb (Deacon) on Jan 17, 2001 at 12:48 UTC
    Here's a routine that should return a random line from a file, without needing to read the entire contents of the file into memory:
    my $file = '/usr/dict/words'; #selecting a random line from a file my $line = do { open FH, "<$file" or die "Could not open $file: $!"; #Go to a random spot in the file seek FH, rand((-s FH) + 1), 0; #This forces us to the next line, because #we might be positioned in the middle of #a line in FH. <FH>; #If the next call to <FH> is about to #return a part of the last line in the #file, then wrap around to the beginning #of the file. We ONLY want complete lines. seek(FH, 0, 0) if eof; #Return the next available, complete, line <FH>; };

    Here's an explanation of the code:

    Most similar routines will break when they randomly select the last line of a file. This is because they do the first <FH> , which forces the file cursor to start at the next line. The only problem is that if you are already positioned somewhere inside the *last* line, the second access of <FH> will return undef, which is not usually what you want.

    Another flaw, with the standard routine of this type, is that we are always looking one line ahead of the line that was randomly seek'd to. This means, that no matter how many times you run the routine, it will *always* miss line 1.

    The above routine gets around that by simly asking to see if we are at the end of the file, using eof. If we are, then we wrap to the beginning. This solves both the problems above, while still maintaining an acceptable level of randomness with rand.

    Update: This routine does have a bias towards longer lines in the file. Do not use it for files where the fields are variable length. Use the methods described here: How do I pick a random line from a file?.

      This will work fine if all the lines in the file are the same (or at least close) in length. However, if you have a file where the text of each line varies, your system will be biased towards longer lines since it is more likely to randomly 'hit' these ones over its smaller brothers.

      Jettra