de2425 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get a program to randomly pick 300 lines from a text file. I am getting output but I'm getting 300 lines that are exactly the same. Would anyone be able to help me and tell me what I'm doing wrong here? I would so appreciate it.

use strict; use warnings; open (OUT, ">c:/work/250/250_random.txt"); open (IN, "c:/work/250/Master_file.txt"); $size=300; $count = (); while (<IN>){ while ($count<=$size){ rand($.)<1 && ($line=$_); print OUT $line; $count++; } } close IN; close OUT;

Replies are listed 'Best First'.
Re: Picking Random Lines from a File
by bart (Canon) on Oct 09, 2008 at 18:45 UTC
    I am getting output but I'm getting 300 lines that are exactly the same.
    while (<IN>){ while ($count<=$size){ rand($.)<1 && ($line=$_); print OUT $line; $count++; } }
    Of course you are.

    For each line in the file, $_ has a specific value. After that, for 300 times, you decide, based on a random number, to assign this one value to $line. Always the exact same string. And then you print it out.

    Whether you assign a value to a variable or not to a variable that already has been set to this value, it doesn't change a thing.

    So, where's your thinko... It's quite obvious to me where you got the basis for the algorithm, it is (or used to be) in the official Perl FAQ. It goes something like this:

    while (<IN>){ rand($.)<1 && ($line=$_); } print OUT $line;
    So you loop through the file, assign or don't assign the current value to $line based on a random value, and in the end, you print out what you have got.

    If you insist to do this 300 times, you will have to read through the file 300 times.

    If you don't want to do that, and you've got memory to spare, you can first read the contents of the file into an array, and randomly pick a line from that array.

    Using the same algorithm (for no good reason, it was chosen because it works without keeping everything in memory at once), this becomes:

    my @lines = <IN>; for my $c (1 .. 300) { my $line; for my $i (0 .. $#lines) { rnd($i+1)<1 and $line = $#lines[$i]; } print OUT $line; }
    but it'll be a lot shorter to just write
    my @lines = <IN>; for my $c (1 .. 300) { print OUT $lines[int rand @lines]; }

    That leaves in duplicates. If you don't want duplicates, simply import the shuffle function from List::Util, shuffle the lines array, and print out the first 300.

    use List::Util qw(shuffle); my @lines = shuffle(<IN>); print OUT @lines[0 .. 299];
    This assumes there are at least 300 lines in the file, or you'll get a bunch of undefined values at the end.
Re: Picking Random Lines from a File
by ikegami (Patriarch) on Oct 09, 2008 at 17:33 UTC

    When $. == 1,

    while ($count<=$size){ rand($.)<1 && ($line=$_); print OUT $line; $count++; }

    is the same as

    while ($count<=$size){ $line=$_; print OUT $line; $count++; }

    because rand(1) always returns something less than 1. Therefore, you always print the first line $size times.

    And when it comes time for $. == 2, $count is greater than $size from the first pass, so the loop is never entered.

    There's nothing you want to happen to a line more than once, so there shouldn't be any nested loops. The inner while is probably suppose to be an if.

Re: Picking Random Lines from a File
by JavaFan (Canon) on Oct 09, 2008 at 20:35 UTC
    The classical algorithm (I think it's described by Knuth) to pick N lines from a file with M lines (N <= M) goes like this:
    1. Read the first N lines in a buffer.
    2. For each next line (say, line k), decide with chance N/k, whether to accept or reject this line. If accepted, randomly replace one of the lines in the buffer.
    In Perl code, you get something like:
    my @buffer; push @buffer, scalar <IN> for 1 .. $N; while (my $line = <IN>) { next unless rand($.) < $N; $buffer [rand @buffer] = $_; } print @buffer;
    A few points:
    • It assumes you have enough memory to store N lines.
    • If you have enough memory to slurp in the entire file, it may be easier to read in the file in an array, shuffle the array, and print the first $N entries.
    • The code as is doesn't preserve order - but you can always store the line number with the line itself, and sort afterwards.
      Hi, I am fairly new to PERL, and I want to randomly select set of 4 lines chunk from a large file. I was looking at this tread, I am able to select randomly N lines, but I wanted to little bit more select a random line number(an odd number) and then select three lines following it and then select another random line (an odd number) and then select three lines following it and so on till N Any help would be greatly appreciated Thanks
Re: Picking Random Lines from a File
by toolic (Bishop) on Oct 09, 2008 at 17:45 UTC
Re: Picking Random Lines from a File
by Illuminatus (Curate) on Oct 09, 2008 at 17:51 UTC
    Take a look at File::Random. random_line() looks like what you are looking for.
Re: Picking Random Lines from a File
by Illuminatus (Curate) on Oct 09, 2008 at 17:22 UTC
    call to rand is actually fine

      If so, his perl is broken. Quote rand,

      Automatically calls srand unless srand has already been called.

      >perl -e"print rand 1" 0.83843994140625 >perl -e"print rand 1" 0.795257568359375 >perl -e"print rand 1" 0.84637451171875
      Is there another way I can put the rand arguement in there? I have a text file with about 90000 lines and I just need to pick 300 lines randomly from it. Sorry for all the questions. I'm truly still a novice at this.