Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How would monks randomly retrieve a subset of files from a directory?

Replies are listed 'Best First'.
(jeffa) Re: random file selection
by jeffa (Bishop) on Jun 11, 2003 at 15:10 UTC
    For just one directory level and only files (-f) with a 50-50 chance:
    use strict; use warnings; my $dir = shift || '.'; my @rand; while (<$dir/*>) { push @rand,$_ if int rand 2 and -f $_; } print $_,$/ for @rand;
    Update: or as a *NIX one-liner:
    perl -le'$d=shift||".";print for grep{int rand 2 and -f $_}<$d/*>'

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: random file selection
by BrowserUk (Patriarch) on Jun 11, 2003 at 15:43 UTC

    If you need to select a subset of a specified size, then you could do something like this. (Note: You probably don't need the glob if your on *nix.

    #! perl -slw use strict; use vars qw[$N]; $N ||= 30; die 'No path specified' unless @ARGV; my @files = glob $ARGV[0]; my $r; $r = rand(@files) and @files[$_,$r] = @files[$r,$_] for 0 .. $N; splice @files, $N; print for @files;

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      I get the 'no path specified' error for this program. I ran the program within the directory that contains the files I wish to randomly extract. I believe that this program is closest to what I am wanting.

        Sorry. I guess I should have added a usage line.

        scriptname [-N=number of files] path\*.ext

        Eg. script -N=10 *.pl will print the names of 30 .pl files from the current directory.

        script "c:\My Files\*" will print the names 30 files of any type from the specified directory.

        You might need to play with it a bit to make it work under *nix.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      Note: You probably don't need the glob if your on *nix.

      Yes, this is true. Instead of my @files = glob $ARGV[0];, it would be my @files = @ARGV; (or one could simply use @ARGV directly). The reason for this is most unix shells expand globs before perl even gets the command line.

Re: random file selection
by derby (Abbot) on Jun 11, 2003 at 16:28 UTC
    You could List::Util::shuffle or fisher-yates the directory and then take a slice of the resulting array.

    !/usr/bin/perl -wd use IO::Dir; tie %dir, IO::Dir, "."; @files = grep { -f } keys %dir; fisher_yates_shuffle( \@files ); print $files[$_],"\n" for 0 .. 5; # right from the faq sub fisher_yates_shuffle { my $deck = shift; # $deck is a reference to an array my $i = @$deck; while ($i--) { my $j = int rand ($i+1); @$deck[$i,$j] = @$deck[$j,$i]; } }

    or

    !/usr/bin/perl -wd use IO::Dir; use List::Util 'shuffle'; tie %dir, IO::Dir, "."; @files = grep { -f } keys %dir; @shuff = shuffle(@files); print $shuff[$_],"\n" for 0 .. 5;

    -derby

      The code at the bottom of this post seems to work nicely. Except that it only works with .pl files in the directory. Why?
        because the others are directories? The -f portion of the snippet extracts only the plaintext files from the list (see perlfunc:_X). Without a listing from the target directory, I cannot be more specific.

        -derby

Re: random file selection
by wufnik (Friar) on Jun 11, 2003 at 17:38 UTC
    this wee hack, weighing in at 5 lines, is very similar in spirit to BrowserUKs, but uses Fisher Yates (apologies to Japhy), allows you to specify a directory and subset magnitude etc.
    opendir( DIR, "." ) || die "Can't open DIRHANDLE, stopped"; -f $_ and !/^\./ and push @entry, "$_" while ($_ = readdir(DIR)); closedir( DIR ); @entry[-$i,$j] = @entry[$j,-$i] while $j = rand(@entry - $i), ++$i < @ +entry; map { print $_ . "\n" } @entry[0 .. ( shift || 1 ) - 1];
    ...wufnik

    -- in the world of the mules there are no rules --