in reply to Picking Random Lines from a File

The classical algorithm (I think it's described by Knuth) to pick N lines from a file with M lines (N <= M) goes like this:
  1. Read the first N lines in a buffer.
  2. For each next line (say, line k), decide with chance N/k, whether to accept or reject this line. If accepted, randomly replace one of the lines in the buffer.
In Perl code, you get something like:
my @buffer; push @buffer, scalar <IN> for 1 .. $N; while (my $line = <IN>) { next unless rand($.) < $N; $buffer [rand @buffer] = $_; } print @buffer;
A few points:

Replies are listed 'Best First'.
Re^2: Picking Random Lines from a File
by nathanroy (Initiate) on Apr 21, 2009 at 20:20 UTC
    Hi, I am fairly new to PERL, and I want to randomly select set of 4 lines chunk from a large file. I was looking at this tread, I am able to select randomly N lines, but I wanted to little bit more select a random line number(an odd number) and then select three lines following it and then select another random line (an odd number) and then select three lines following it and so on till N Any help would be greatly appreciated Thanks