Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: grabbing random n rows from a file

by japhy (Canon)
on Jul 18, 2006 at 21:57 UTC ( #562145=note: print w/replies, xml ) Need Help??

in reply to grabbing random n rows from a file

I think this method is accurate and fair:
open my $some_filehandle, "<", "quotefile.txt"; my $set_size = 3; my $set = random_set_of_n($some_filehandle, $set_size); sub random_set_of_n { my ($fh, $size) = @_; my @set; local ($., $_); seek $fh, 0, 0; while (<$fh>) { chomp; push @set, $_; last if @set == $size; } # XXX: @set *should* be shuffled now if you care about ordering while (<$fh>) { chomp; $set[rand @set] = $_ if $size/$. > rand; } return \@set; }
I think it's a fair distribution. My tests imply it is. Update: the set should be shuffled where I've indicated. It's not necessary if you're going to be plucking elements from it at random later on, though, only if you want a randomly ordered list returned.

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://562145]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2022-05-27 00:34 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (94 votes). Check out past polls.