Angharad has asked for the wisdom of the Perl Monks concerning the following question:

OK, let me state first of all that I'm not asking for code (although all donations gratefully recieved), just general advice as to how to go about a particular task.
I have in my possession 36 text files that look like this (only longer):
-0.29 -0.39 -0.51 -0.65 -0.85 -1.07 -1.33 -1.66 -1.92 -2.11 -2.21
And I want to be able to write a perl script that will randomly select 3 of this files and print the contents of the remainder (i.e. the other 33 files) into one file like this
-0.29 -0.2 0.62 -0.82 -0.39 -0.25 0.79 -1.05 -0.51 -0.34 0.97 -1.35 -0.65 -0.46 1.15 -1.67 -0.85 -0.64 1.34 -2.07 -1.07 -0.86 1.57 -2.55 -1.33 -1.07 1.8 -3.05 -1.66 -1.33 2.05 -3.64 -1.92 -1.57 2.21 -4.28 -2.11 -1.77 2.23 -4.91 etc ... for 33 files
I want to be able to do this a number of times until all combinations have been covered ensuring that a particular 'three file combination' is only chosen once (of course any file can, on its own, be chosen a number of times, but only once in a particular combination of two other files ... make sense?).
So far, my logic on how to proceed is as follows:
Rename the text files 1 to 36 and then chose a particular three way combo using a random number generator. A flag would then have to be created .. perhaps a variable that is unique only to that particular 3 file combo ... which is set to 1, to ensure that this combo is not picked again.
The end result file is then created from the remaining 33 files.
Logical or not? Any advice, pointers much appreciated.

Replies are listed 'Best First'.
Re: randomly selecting files
by inman (Curate) on Jul 29, 2005 at 10:40 UTC
    Have a look at Math::Combinatorics which will allow you to calculate all of the combinations. You can then do what you like with the results.

    #! /usr/bin/perl # use strict; use warnings; use Math::Combinatorics; my @files; push @files, "file$_" foreach (1..36); my $c = Math::Combinatorics->new( count => 3, data => \@files); while (my @combo = $c->next_combination()) { my %combins; @combins{@combo}=(); my @leftovers = grep {!exists($combins{$_})} @files; print "Choose: @combo\tLeave: @leftovers\n";}

    update: Obviously you would populate the @files array using a glob or similar. If your files aren't large you could pre-read them into memory.

Re: randomly selecting files
by blazar (Canon) on Jul 29, 2005 at 10:41 UTC
    Well, you talk about picking three files at random at once, but then you also say "until all combinations have been covered". Which means that what is really random is only the order in which you do things but that does seem mostly unimportant to me. If it actually is, then you may look up some combinatorial module on CPAN.

    Said this, there are 7140 ways to pick 3 items out of 36, in which case you will have 7140 output files having a lot of redundant content each other. Is this what you really want?!?

      Updated: Looks like this is correct, although reading the OP I interpreted it differently.

      One world, one people

        He wrote:
        And I want to be able to write a perl script that will randomly select 3 of this files and print the contents of the remainder (i.e. the other 33 files) into one file like this
        It seems to me that he wants an output file for each of the Binomial(36,3)=7140 choices of three files out of 36. I'm not sure where you get this 33! thing from. Maybe I overlooked something.
Re: randomly selecting files
by anonymized user 468275 (Curate) on Jul 29, 2005 at 11:29 UTC
    Updated.

    Wouldn't it be better to redefine the requirement that the program only outputs one file, but that it takes an argument to determine which of the permutations it dumps out?

    One world, one people

Re: randomly selecting files
by Angharad (Pilgrim) on Jul 29, 2005 at 10:44 UTC
    Thanks for the help so far. Unfortunately yes, Blazer, I know, but it is what I need. I do take your point though. You think it might take a while?
      How much is a while? It takes about 7140 times as long as merging one set of files. It's very likely that your program will be I/O bound, and hence the size of the files are going to be the biggest factor in determining whether or not "it takes a while".
      Well, 7140 are not that many. Well, depending on you hardware, OS, FS, that is.

      It also depends on how big the input files are. If they are say 10 Mb each (not that unreasonable), then you will have 10 * 33 * 7140 Mb = 2.25 Tb of disk usage for the output files. That is a lot of disk space IMO.

      One still wonders why you have to do this and if there could not be alternative strategies, too...

      PS: please reply to one's nodes instead of replying to your own and address your reply to someone else referring to his post...