Selecting, matching and counting column elements, using randomly generated numbers

$new_guy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Hello. I have written a script that randomly generates numbers that are grouped together (separated by a space when printed on the screen!). The size of the group increases until a maximum is reached (the maximum is inserted at command line). For purposes of this question the number is 96.

The script takes a file containing the columns (of z's) which I want compare and count. The command for running it is:

perl script.pl <filename> <number, 96 in this case>
[download]

After randomly generating the column numbers what I would like to do is to go to that column and compare it to other columns. I would like to compare where they have z's in the same row. If they all have a "z" in the same row (ie same position) then the count increases, otherwise if they don't have a "z" or if one if them lacks a "z" a count is not taken.

My script is:

#!/usr/bin/perl

  use strict;
  use warnings;

#exit if there's more or less than two arguments
  if(scalar(@ARGV)!= 2) { print "\nUsage script.pl <file name> <number
+ of columns>\n";
                      exit();
                       }

##you will print results but first remove any previous files
      my $remove_random = "random.txt";
        if (unlink($remove_random) == 1) {
                        print "Existing \"random.txt\" file was remove
+d\n";
                          }
 ## proceed by opening  the file
 
my $ro = $ARGV[0];
    open(DATA3, $ro);
  
        while ($ro = <DATA3>)
            {
                      
            #now make a file for the output
        my $output_r = "random.txt"; 
           if (! open(POS, ">>$output_r") ) {
            print "Cannot open file \"$output_r\" to write to!!\n\n";
        exit;
        }  
   
# now randomly generate the columns to count z's
# but first declare variables

 my $randomize = $ARGV[1];      # the number of columns entered at com
+mand-line
  
  my $range = $randomize;       # the maximum number of columns
  my $minimum = 1;            # the minimum number of columns
  my $y;                    # the increasing number of columns
  my $x;                  # the random genome selected
  my $count;             # count the number of randomisations done
  my @uniform = ();
  my @data = ();
  my $n = 0;
  
#loop through the selection process

  for($y = 1; $y < $range +1; $y++){    # make selection from 2 column
+s to 96 columns
           
                      print "\n";       # separate each random selecti
+on by a space
                      
           for($x = 1; $x < $y; $x++){           # do the random colum
+n selection

           #randomly select columns
           my $random_number = int(rand($range)) + $minimum;
  
           #print the columns selected at random
           print $random_number . "\n";
           $count++;
                      
                     
## random columns for selection have been created
## now compare the elements of each of the groups selected and count o
+nly the number of z's common to all columns for each group!
## i.e. count only those times that have z's in all of them (i.e. the 
+group)
           
  ##this bit of the script is not working ###

 #        @uniform = $random_number;
                                
#                    my @temp =   map { [ $_[1], $_[0], $_ ] } # step 
+1
   #                          map { $_->[2] } # step 2
   #                           @uniform;

#Count array elements that match a pattern
#In a scalar context, grep returns a count of the selected elements.
#foreach my $num_genes(@temp){
#print POS "@temp\n";
#}
              }
       }
  
#evaluate the number of random columns/columns selections used for thi
+s analysis

          print POS "\n". $count*30 ." random columns selections were 
+used!!\n";
          print  "\n". $count*30 ." random columns selections were use
+d!!\n"; 
  }
            # the end #
            
              my $count2;
                  open (FILE, "random.txt") or die"can't count cluster
+s\n";
                    $count2++ while <FILE>;
                  print "\n$count2 round(s) done\n";
[download]

My data file is:

0     z       z        z        z        z        z        z z        
+ z        z    z        z   z   z  z   z   z  z  z   z  z  z   z   z 
+  z          -   z   z   z  z   z   z   z  z   z   z   z   z  z   z  
+ z   z  z   z   z   z  z   z   z   z   z   z   z   z   z   z   z   z 
+  z   z   z   z  z   z   z   z  z   z   z   z   z   z  z  z   z  z   
+z  z   z  z   z   z   z   z   z  z   z  z   z  z   z  z   z   z   z  
+ z 
0     z       z        z        z        z        z          - z      
+    -        z    z        z   z   z  z          -   z  z  z   z  z  
+        -   z   z          -          -   z          -   z  z   z   z
+          -  z   z   z   z   z  z   z   z          -  z   z   z      
+    -  z   z   z          -          -   z          -   z   z   z   z
+   z   z   z   z   z  z   z   z   z  z   z   z   z   z   z  z  z     
+     -  z   z  z   z  z          -   z   z          -   z  z   z  z  
+ z  z   z  z   z   z   z   z 
0     z       z        z        z        z        z          - z      
+    -        z    z        z   z          -  z          -          - 
+ z  z   z  z          -   z          -          -          -   z     
+     -          -          -   z          -          -  z   z   z   z
+          -  z   z   z          -  z   z   z          -          -   
+       -   z          -          -   z          -          -   z   z 
+  z   z          -   z          -          -  z   z   z   z  z       
+   -   z          -   z   z  z          -          -          -   z  
+z          -          -          -   z   z          -   z  z   z  z  
+ z  z   z          -   z   z   z          - 
0     z       z        z        z        z        z          - z      
+    -        z    z        z   z          -          -          -    
+      -  z          -   z  z          -   z          -          -    
+      -   z          -          -          -   z          -          
+-  z   z   z   z          -  z          -   z          -  z   z      
+    -          -          -          -          -          -         
+ -   z          -          -   z          -          -   z          -
+   z          -          -  z          -   z          -          -   
+       -          -          -          -   z  z          -          
+-          -   z  z          -          -          -   z   z         
+ -   z  z   z  z          -  z          -          -          -      
+    -   z          - 
0     z       z          -        z          -        z          - z  
+        -        z    z          -          -          -          -  
+        -          -  z          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -   z          -          -          -   z   z       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -   z          - 
+         -          -          -   z          -          -  z        
+  -          -          -          -          -          -          -
+          -          -          -          -          -          -   
+z          -          -          -          -   z          -         
+ -   z          -   z  z          -  z          -          -         
+ -          -          -          - 
0          -       z          -        z          -          -        
+  - z          -        z          -          -          -          -
+          -          -          -          -          -          -   
+       -          -          -          -          -          -      
+    -          -          -          -          -          -         
+ -          -   z   z          -          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+ z          -          -          -          -          -          - 
+         -          -          -          -          -          -    
+      -          -          -          -   z          -          -   
+       -          -   z          -          -   z          -         
+ -  z          -  z          -          -          -          -      
+    -          - 
0          -       z          -        z          -          -        
+  -          -          -        z          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          -          -        
+  -          -          -          -          -          -          -
+          -          -          -   z          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          -          -        
+  -          -   z          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -   z          -          - 
+  z          -          -          -          -  z          -        
+  -          -          -          -          - 
0          -          -          -          -          -          -   
+       -          -          -          -          -          -      
+    -          -          -          -          -          -         
+ -          -          -          -          -          -          - 
+         -          -          -          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -   z          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          -          -        
+  -          - 
0          -          -          -          -          -          -   
+       -          -          -          -          -          -      
+    -          -          -          -          -          -         
+ -          -          -          -          -          -          - 
+         -          -          -          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -   z          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          -          -        
+  -          - 
1     z       z        z        z        z        z        z z        
+ z        z    z        z   z   z  z   z   z  z  z   z          -  z 
+  z   z   z   z   z   z   z  z   z   z   z  z   z   z   z          - 
+ z   z   z   z  z   z   z   z  z   z   z          -   z   z   z      
+    -   z   z          -   z   z   z   z   z  z   z   z   z  z   z   
+z   z   z   z  z  z   z  z   z  z   z  z   z   z   z   z   z  z   z  
+z   z  z   z  z          -   z   z   z 
1     z       z        z        z          -        z          - z    
+     z        z    z          -   z   z  z   z          -  z  z   z  
+        -          -   z   z   z   z   z   z          -          -   
+       -   z          -          -   z   z   z          -  z   z   z 
+  z  z   z   z          -  z   z   z          -          -   z       
+   -          -   z   z          -   z   z   z   z   z  z   z   z   z
+          -          -   z          -   z   z  z  z          -       
+   -   z          -   z  z   z   z   z   z   z  z   z  z   z  z   z  
+z          -          -   z   z 
1     z       z        z        z          -        z          - z    
+     z        z    z          -          -   z          -   z        
+  -  z  z          -          -          -   z          -          - 
+         -   z   z          -          -          -          -       
+   -          -          -   z   z          -  z   z          -      
+    -  z   z          -          -  z          -   z          -      
+    -          -          -          -   z   z          -          - 
+         -          -          -          -  z          -   z   z    
+      -          -   z          -   z   z  z          -          -   
+       -   z          -          -  z          -   z   z   z   z  z  
+ z  z          -  z          -          -          -          -      
+    -          - 
1     z       z        z        z          -        z          - z    
+     z        z    z          -          -   z          -          - 
+         -  z          -          -          -          -          - 
+         -          -          -          -   z          -          -
+          -          -          -          -          -   z   z      
+    -  z   z          -          -          -          -          -  
+        -  z          -   z          -          -          -         
+ -          -          -   z          -          -          -        
+  -          -          -  z          -          -   z          -    
+      -          -          -          -          -  z          -    
+      -          -   z          -          -          -          -   
+z          -          -   z  z   z  z          -  z          -       
+   -          -          -          -          - 
1     z       z        z        z          -        z          - z    
+     z        z    z          -          -          -          -     
+     -          -          -          -          -          -        
+  -          -          -          -          -          -          -
+          -          -          -          -          -          -   
+       -          -          -          -  z          -          -   
+       -          -          -          -          -  z          -   
+z          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          -          -        
+  -          -          -          -          -          -          -
+          -          -          -          -          -          -   
+       -          -          -          -   z          -   z         
+ -          -  z          -          -          -          -         
+ -          - 
1     z       z        z        z          -        z          - z    
+     z        z    z          -          -          -          -     
+     -          -          -          -          -          -        
+  -          -          -          -          -          -          -
+          -          -          -          -          -          -   
+       -          -          -          -  z          -          -   
+       -          -          -          -          -          -      
+    -          -          -          -          -          -         
+ -          -          -          -          -          -          - 
+         -          -          -          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          - 
1     z       z        z        z          -          -          - z  
+       z        z    z          -          -          -          -   
+       -          -          -          -          -          -      
+    -          -          -          -          -          -         
+ -          -          -          -          -          -          - 
+         -          -          -          -          -          -    
+      -          -          -          -          -          -       
+   -          -          -          -          -          -          
+-          -          -          -          -          -          -  
+        -          -          -          -          -          -     
+     -          -          -          -          -          -        
+  -          -          -          -          -          -          -
+          -          -          -          -          -          -   
+       -          -          -          -          -          -      
+    -          -          -          -          -          -
[download]

Other queries with the script are:-

- It seems to increase the number of iterations every time the file size changes! I would like to keep this constant at say 200. So that each result has 200 rounds/iterations done

I would like to have the counts of z's printed out for each iteration, and an average for all counts at the end. Possibly displayed as columns for each round with the last being the average.

Comment on Selecting, matching and counting column elements, using randomly generated numbers Select or Download Code

Replies are listed 'Best First'.
Re: Selecting, matching and counting column elements, using randomly generated numbers by moritz (Cardinal) on Sep 29, 2010 at 09:30 UTC
Just because you're using random numbers doesn't mean you should use random indentation for your code; it makes reading very hard. A good way to format code is to start in the first column, and indent a fixed amount of spaces (for example 4) for each level of unclosed curly brace. See also: perlstyle. This is not just a question of aesthetics, it's a necessity for any nontrivial program. I don't quite understand your code, and what you want to achieve; one comment says random columns for selection have been created, but I don't see any created columns; you just print some numbers to standard output, but never record them in a data structure, so they are essentially lost to the program. `##this bit of the script is not working ### # @uniform = $random_number; # my @temp = map { [ $_[1], $_[0], $_ ] } # step +1 # map { $_->[2] } # step 2 # @uniform;` [download] It's not working because after the line `@uniform = $random_number;`, the array `@uniform` contains a single number. Whereas the map accesses the array elements as if there were array references stored in the array. My general advise is to don't use map until you understand what your variables contain, and the basic control flow. Data::Dumper can help you with the former. Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l] [select]
Re^2: Selecting, matching and counting column elements, using randomly generated numbers by $new_guy (Acolyte) on Sep 29, 2010 at 10:51 UTC
Dear Moritz, The numbers are generated at random and I just print them to the screen to show what they are. But, yes you could store them in a file! So what i intend to do is to use the random numbers generated to do the counts! For instance if my group has 38, 39, 40; then what I intend to do is to compare the z's in columns 38, 39, 40 (that have been randomly generated) of my file! So if they are 44, 45, 99, ...., 123; I would like to count and find the average of all the z's that are common to all these columns! The bit that doesn't work is where I run out of ideas and got stuck!	[reply]
Re^3: Selecting, matching and counting column elements, using randomly generated numbers by moritz (Cardinal) on Sep 29, 2010 at 11:24 UTC
But, yes you could store them in a file! But your program doesn't do that. And later on you want to compare those values to some other values, and it doesn't work... because you don't have access to them anymore. So, let's summarize: You generate values, but you don't store them. You read data from a while, but you don't do anything with it. So there's are at least two steps missing: extracting the columns from the read data, and make the generated data available to the program itself. When you've done these two steps, maybe you'll get unstuck. Also please notice that your description of what you want to do is incomplete: you write you want to compare values, but you never mentioned what you want to do with the result of the comparison. Store it? count it? make funny bit masks? destroy the world? Perl 6 - links to (nearly) everything that is Perl 6.	[reply]
Re^4: Selecting, matching and counting column elements, using randomly generated numbers by $new_guy (Acolyte) on Sep 30, 2010 at 13:06 UTC
Re: Selecting, matching and counting column elements, using randomly generated numbers by perlpie (Beadle) on Sep 29, 2010 at 10:37 UTC
`##this bit of the script is not working ### # @uniform = $random_number; # my @temp = map { [ $_[1], $_[0], $_ ] } # step +1 # map { $_->[2] } # step 2 # @uniform;` [download] From the "step 1" and "step 2" comments, I think that you may just be able to reverse those parts. `my @temp = # third, the final results are stored in @temp map { $_->[2] } # second this applies to the results of the fo +llowing: map { [ $_[1], $_[0], $_ ] } # first this applies to @uniform @uniform;` [download] In many cases where folks stack map or grep, they could combine them. The above is the same as `my @temp = map { ($_[1], $_[0], $_ )[2] } @uniform;` [download] which is the same as `my @temp = @uniform;` [download] Did you intend for that portion of code to do something else?	[reply] [d/l] [select]