Keep It Simple, Stupid

Re: grabbing random n rows from a file

by japhy (Canon)
on Jul 18, 2006

in reply to grabbing random n rows from a file

I think this method is accurate and fair:
open my $some_filehandle, "<", "quotefile.txt"; my $set_size = 3; my $set = random_set_of_n($some_filehandle, $set_size); sub random_set_of_n { my ($fh, $size) = @_; my @set; local ($., $_); seek $fh, 0, 0; while (<$fh>) { chomp; push @set, $_; last if @set == $size; } # XXX: @set *should* be shuffled now if you care about ordering while (<$fh>) { chomp; $set[rand @set] = $_ if $size/$. > rand; } return \@set; }
I think it's a fair distribution. My tests imply it is. Update: the set should be shuffled where I've indicated. It's not necessary if you're going to be plucking elements from it at random later on, though, only if you want a randomly ordered list returned.

Jeff japhy Pinyan
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

