$new_guy has asked for the wisdom of the Perl Monks concerning the following question:
Hello. I have written a script that randomly generates numbers that are grouped together (separated by a space when printed on the screen!). The size of the group increases until a maximum is reached (the maximum is inserted at command line). For purposes of this question the number is 96.
The script takes a file containing the columns (of z's) which I want compare and count. The command for running it is:
perl script.pl <filename> <number, 96 in this case>
After randomly generating the column numbers what I would like to do is to go to that column and compare it to other columns. I would like to compare where they have z's in the same row. If they all have a "z" in the same row (ie same position) then the count increases, otherwise if they don't have a "z" or if one if them lacks a "z" a count is not taken.
My script is:
#!/usr/bin/perl use strict; use warnings; #exit if there's more or less than two arguments if(scalar(@ARGV)!= 2) { print "\nUsage script.pl <file name> <number + of columns>\n"; exit(); } ##you will print results but first remove any previous files my $remove_random = "random.txt"; if (unlink($remove_random) == 1) { print "Existing \"random.txt\" file was remove +d\n"; } ## proceed by opening the file my $ro = $ARGV[0]; open(DATA3, $ro); while ($ro = <DATA3>) { #now make a file for the output my $output_r = "random.txt"; if (! open(POS, ">>$output_r") ) { print "Cannot open file \"$output_r\" to write to!!\n\n"; exit; } # now randomly generate the columns to count z's # but first declare variables my $randomize = $ARGV[1]; # the number of columns entered at com +mand-line my $range = $randomize; # the maximum number of columns my $minimum = 1; # the minimum number of columns my $y; # the increasing number of columns my $x; # the random genome selected my $count; # count the number of randomisations done my @uniform = (); my @data = (); my $n = 0; #loop through the selection process for($y = 1; $y < $range +1; $y++){ # make selection from 2 column +s to 96 columns print "\n"; # separate each random selecti +on by a space for($x = 1; $x < $y; $x++){ # do the random colum +n selection #randomly select columns my $random_number = int(rand($range)) + $minimum; #print the columns selected at random print $random_number . "\n"; $count++; ## random columns for selection have been created ## now compare the elements of each of the groups selected and count o +nly the number of z's common to all columns for each group! ## i.e. count only those times that have z's in all of them (i.e. the +group) ##this bit of the script is not working ### # @uniform = $random_number; # my @temp = map { [ $_[1], $_[0], $_ ] } # step +1 # map { $_->[2] } # step 2 # @uniform; #Count array elements that match a pattern #In a scalar context, grep returns a count of the selected elements. #foreach my $num_genes(@temp){ #print POS "@temp\n"; #} } } #evaluate the number of random columns/columns selections used for thi +s analysis print POS "\n". $count*30 ." random columns selections were +used!!\n"; print "\n". $count*30 ." random columns selections were use +d!!\n"; } # the end # my $count2; open (FILE, "random.txt") or die"can't count cluster +s\n"; $count2++ while <FILE>; print "\n$count2 round(s) done\n";
My data file is:
0 z z z z z z z z + z z z z z z z z z z z z z z z z + z - z z z z z z z z z z z z z z + z z z z z z z z z z z z z z z z z z + z z z z z z z z z z z z z z z z z z +z z z z z z z z z z z z z z z z z z z + z 0 z z z z z z - z + - z z z z z z - z z z z z + - z z - - z - z z z z + - z z z z z z z z - z z z + - z z z - - z - z z z z + z z z z z z z z z z z z z z z z z + - z z z z z - z z - z z z z + z z z z z z z z 0 z z z z z z - z + - z z z z - z - - + z z z z - z - - - z + - - - z - - z z z z + - z z z - z z z - - + - z - - z - - z z + z z - z - - z z z z z + - z - z z z - - - z +z - - - z z - z z z z + z z z - z z z - 0 z z z z z z - z + - z z z z - - - + - z - z z - z - - + - z - - - z - +- z z z z - z - z - z z + - - - - - - + - z - - z - - z - + z - - z - z - - + - - - - z z - +- - z z - - - z z + - z z z z - z - - - + - z - 0 z z - z - z - z + - z z - - - - + - - z - - - - + - - - - - - + - - z - - - z z + - - - - - - +- - - - - - - + - - - - - z - + - - - z - - z + - - - - - - - + - - - - - - +z - - - - z - + - z - z z - z - - + - - - - 0 - z - z - - + - z - z - - - - + - - - - - - + - - - - - - + - - - - - - + - - z z - - - - + - - - - - - + - - - - - - +- - - - - - - + z - - - - - - + - - - - - - + - - - - z - - + - - z - - z - + - z - z - - - - + - - 0 - z - z - - + - - - z - - - + - - - - - - + - - - - - - + - - - - - - - + - - - z - - +- - - - - - - + - - - - - - + - - - - - - + - - z - - - - + - - - - - - +- - - - - - - + - - - - z - - + z - - - - z - + - - - - - 0 - - - - - - + - - - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - - - + - - - - - - +- - - - - - - + - - - - - - + - - - z - - - + - - - - - - + - - - - - - +- - - - - - - + - - - - - - + - - - - - - + - - 0 - - - - - - + - - - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - - - + - - - - - - +- - - - - - - + - - - - - - + - - - z - - - + - - - - - - + - - - - - - +- - - - - - - + - - - - - - + - - - - - - + - - 1 z z z z z z z z + z z z z z z z z z z z z - z + z z z z z z z z z z z z z z z - + z z z z z z z z z z z - z z z + - z z - z z z z z z z z z z z +z z z z z z z z z z z z z z z z z z z +z z z z z - z z z 1 z z z z - z - z + z z z - z z z z - z z z + - - z z z z z z - - + - z - - z z z - z z z + z z z z - z z z - - z + - - z z - z z z z z z z z z + - - z - z z z z - + - z - z z z z z z z z z z z z z +z - - z z 1 z z z z - z - z + z z z - - z - z + - z z - - - z - - + - z z - - - - + - - - z z - z z - + - z z - - z - z - + - - - - z z - - + - - - - z - z z + - - z - z z z - - + - z - - z - z z z z z + z z - z - - - - + - - 1 z z z z - z - z + z z z - - z - - + - z - - - - - + - - - - z - - + - - - - - z z + - z z - - - - - + - z - z - - - + - - - z - - - + - - - z - - z - + - - - - - z - + - - z - - - - +z - - z z z z - z - + - - - - - 1 z z z z - z - z + z z z - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - z - - + - - - - - z - +z - - - - - - + - - - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - z - z + - - z - - - - + - - 1 z z z z - z - z + z z z - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - z - - + - - - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - - - + - - - - - - +- - - - - - - + - - - - - - + - - - - - 1 z z z z - - - z + z z z - - - - + - - - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - - - + - - - - - - +- - - - - - - + - - - - - - + - - - - - - + - - - - - - - + - - - - - - + - - - - - - + - - - - - -
Other queries with the script are:-
- It seems to increase the number of iterations every time the file size changes! I would like to keep this constant at say 200. So that each result has 200 rounds/iterations done
I would like to have the counts of z's printed out for each iteration, and an average for all counts at the end. Possibly displayed as columns for each round with the last being the average.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Selecting, matching and counting column elements, using randomly generated numbers
by moritz (Cardinal) on Sep 29, 2010 at 09:30 UTC | |
by $new_guy (Acolyte) on Sep 29, 2010 at 10:51 UTC | |
by moritz (Cardinal) on Sep 29, 2010 at 11:24 UTC | |
by $new_guy (Acolyte) on Sep 30, 2010 at 13:06 UTC | |
|
Re: Selecting, matching and counting column elements, using randomly generated numbers
by perlpie (Beadle) on Sep 29, 2010 at 10:37 UTC |