Jettra has asked for the wisdom of the Perl Monks concerning the following question:

Given a file with a set of words seperated by carriage returns, what is the most efficient means of choosing any single word at random? Thanks for your help, Jettra

Replies are listed 'Best First'.
Re: Finding a random line
by clemburg (Curate) on Jan 17, 2001 at 14:07 UTC

    You want to read the Perl Cookbook, chapter 8 (File Contents), page 284-285.

    For your convenience, I put the code that does what you want in here.

    Solution: Use rand() and $. to decide which line to keep:

    rand($.) < 1 && ($line = $_) while <>; # $line is the random line

    This works because you have a 1/N chance for line N to be kept. For more explanations, get the book.

    Extending this example to choosing a random word from an array of words is left as an exercise for the reader (I always wanted to say that!) ...

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

Re: Finding a random line
by dkubb (Deacon) on Jan 17, 2001 at 12:56 UTC
Re: Finding a random line
by lemming (Priest) on Jan 17, 2001 at 12:58 UTC
    Before we can really answer your question:
    In what manner of efficiency do you want?
    Would placing all lines in an array and picking a random number based on the size work or too memory intensive?
    Would checking the size of the file with "wc -l" and then reading down a random number of lines work? Or is that not portable enough?
    You could read the file twice to avoid the "wc -l" which is a bit intensive, but the filesytem may give you leeway with buffering.
    You could massage the data so that the leading line has the # to go off of. Or even better create a file with the size data in the first block you read and then seek to the random spot.
    If none of this gives a clue for you to go off of, then please give more info back.
Re: Finding a random line
by I0 (Priest) on Jan 18, 2001 at 02:02 UTC
Re: Finding a random line
by EvanK (Chaplain) on Jan 17, 2001 at 23:47 UTC
    Wait wait, do you want a single line or a single word? If a line, then:
    srand; # seed the random generator open(FILE,"file.txt"); # put the file contents into an array @file = <FILE>; close(FILE); foreach (@file) {chomp} # remove the newlines $line = $file[rand($#file)]; # select a random line
    If a word:
    srand; # seed the random generator open(FILE,"file.txt"); # put the file contents into one big scalar while(<FILE>) { $file .= <FILE>; } close(FILE); $file =~ s/\r/ /g; # remove carriage returns and replace them with spa +ces $file =~ s/\s+/\s/g; # remove multiple consecutive spaces @file = qw($file); # seperate the file into single words $word = $file[rand($#file)]; # select a random word
    The code is kind of comment-laden, but they're just so you know what's what.
    ______________________________________________
    When I get a little money, I buy books. If I have any left over, I buy food and clothes.
    -Erasmus