in reply to Re: selecting N random lines from a file in one pass
in thread selecting N random lines from a file in one pass
However, if you do have the entire data set in memory, there's a simpler approach than the math theory you use in your module. Rather than iterate the array, consider just drawing at random. If there's a duplicate, then redraw. The larger the set, the less the odds of a duplicate. This scales a lot better than your module, but I admit that does trip up more on sets close to sample size.
It's a lot like how, in gaming, you can fake a d12 by rolling a d20 until you've got a number in range.
use strict; use Data::Dumper; my @set = ('a'..'z'); my $size = 3; my @sample; my %seen; while( @sample < $size ){ my $elt; do{ $elt = int rand(@set); }while( $seen{$elt} ); push @sample, $set[$elt]; $seen{$elt}++; } print Dumper \@sample;
This example should really have a check, too, to see if the sample size is in the bounds of the set, to avoid looping.
|
---|