in reply to Extract random records from text file

For a solution which does not suck in the whole file to memory (and I don't see any of those yet), here's a variation on How do I select a random line from a file?:
my $q_file = "tmp.txt"; my $n = 10; my @ques = get_questions($q_file, $n); print for @ques; sub get_questions { local $/="===\n"; local $.; # Is this necessary? local @ARGV = (shift); # If you want 'open ... or die' behavior local $SIG{__WARN__} = sub { die shift }; my $num = shift; my @questions; while (<>) { push(@questions, $_), next if $. <= $num; my $i = rand($.); $questions[$i] = $_ if $i < $num; } @questions; }

Replies are listed 'Best First'.
Re (tilly) 2 (unrandomness): Extract random records from text file
by tilly (Archbishop) on Oct 02, 2001 at 22:15 UTC
    I like this solution best because it does something honestly new and unusual.

    However there is one caveat. While it does indeed avoid reading the file into memory, and the actual questions chosen are randomized, the first 10 questions in the list will always appear "in position" if they appear. For perfectly random questions, therefore, it would be best to throw in a shuffle at the end.

      Or you can just shuffle them from the start:
      my $num = shift; my @init = (0..$num-1); my @questions; while (<>) { $questions[splice @init, rand(@init), 1] = $_, next if $. <= $num; my $i = rand($.); $questions[$i] = $_ if $i < $num; }
        I prefer shuffling at the end using Fischer-Yates. Shuffling at the start with splice is algorithmically very bad. OK, with only 10 elements, it works just fine, but I tend avoid splice on general principle if there is a more efficient alternative. (Which there is in this case.)