sesemin has asked for the wisdom of the Perl Monks concerning the following question:

Hi Respected Monks,

PLEASE DOWNLOAD THE ENTIRE CODE FROM RESPONSE TO GRANDFATHER AND ALSO FIRST FEW LINES OF THE FIRST AND THE SECOND FILE

I have the following code and I need to print the unique elements of array of array or array of hash whatever you call "push(@{$genes_number{$k}}, $current_line[ 1 ])"

I know how to do it with a simple array. I originally wrote the subroutine to count each unique element of @genes_number array. Then I said ok, lets expand the code by looping through the keys of %gc_ag hash and print out all counts at each keys. But as the data structure got more complex I stuck.

my %genes_number =(); while(<INPUT1>){ chomp; my @current_line = split /\t/; #%gc_ag hash has sotred the keys and values # read from another file. Keys are 5 to 18 # and values are some numbers for each #key explained in read more foreach my $k (sort {$a<=>$b} keys %gc_ag) { if ($current_line[3] == $k && $current_line[8] > $gc_a +g{$k} ) { # push(@genes_number, $current_line[1]); # this was my original array that was used in # in the sub. Now the sub needs to be changed # according to the following array of hash push(@{$genes_number{$k}}, $current_line[1]); } } } #################################### ###########subroutine############### sub count_unique { @genes_number = @_; my %count; map { $count{$_}++ } @genes_number; #print them out: map {push our @arr, ${count{$_}}} sort keys(%count); for (my $j=1; $j<10; $j++){ my $counter =0; foreach my $element(@arr) { if ($element >=$j) { $counter++; } } print "$j\t$counter\n"; } my $i =0; $i += keys %count; # print $i; return %count; }
The goal is to find Xs and print some thing like the following where 5 to 18 are the keys of %gc_ag hash and 1..10 for each row is the numbers determined by $j in the sub routine which are the frequency of $current_line [ 1 ] if $current_line [ 8 ] is greater than each value of %gc_ag hash. It does not matter if header line for column or rows are not printed as long as the order is correct.
    5  6  7  8  9  10  11  12 13 14 15 16 17 18 

1   X  X  X  X  X   X   X   X  X  X  X  X  X  X      
2   X  X  X  X  X   X   X   X  X  X  X  X  X  X
3   X  X  X  X  X   X   X   X  X  X  X  X  X  X
4   X  X  X  X  X   X   X   X  X  X  X  X  X  X
5   X  X  X  X  X   X   X   X  X  X  X  X  X  X
6
7
8
9
10  X  X  X  X  X   X   X   X  X  X  X  X  X  X


Here is the %gc_ag content

Key => value
5=> 3.85
6=> 4.45
7=> 7.76
8=> 10.12
9=> 11.41
. .
. .
. .
18=>118.21


Here is an example of INPUT1 - There are 9 columns from 0 to 8

4750739 A209.EF064282   53      11      0.474968        -33.2   S       4750739 165.834 44.7
3383536 A209.EF064282   55      11      0.500083        -32.4   A       3383536 323.299 49
2634649 A209.EF064282   57      10      0.394855        -32     S       2634649 335.989 70.8
2923929 A209.EF064282   59      10      0.440602        -32.6   A       2923929 182.191 56.2
2756668 A209.EF064282   61      10      0.439497        -32.2   S       2756668 195.982 51
2961071 A209.EF064282   63      11      0.446359        -32.9   A       2961071 89.8963 23.8
3553101 A209.EF064282   65      11      0.414364        -33.2   S       3553101 101.563 23.9
3837211 A209.EF064282   67      10      0.432963        -32     A       3837211 57.7682 28.5
1832375 A209.EF064282   69      10      0.395152        -32     S       1832375 190.361 33
3357379 A209.EF064282   71      9       0.361169        -30.7   A       3357379 65.9899 18.5
5143976 A209.EF064282   73      8       0.384216        -30.1   S       5143976 44.6272 18.3
5734641 A209.EF064282   75      9       0.370324        -30.7   A       5734641 43.0392 11.8
5472474 A209.EF064282   77      10      0.426362        -31.9   S       5472474 28.6393 22.5
1601877 A209.EF064282   79      11      0.426462        -31.9   A       1601877 59.4263 18.3
5994790 A209.EF064282   81      11      0.42787 -32.5   S       5994790 310.741 76.1
548705  A209.EF064282   83      12      0.495855        -33.9   A       548705  140.423 36.2
5325020 A209.EF064282   85      13      0.546329        -35.6   S       5325020 865.633 111.4
4729214 A209.EF064282   87      13      0.288539        -35.6   A       4729214 892.606 179.7
4408466 A209.EF064282   89      12      0.542443        -34.8   S       4408466 980.892 210.5
3566139 A209.EF064282   91      13      0.543212        -36     A       3566139 1194.33 242.3
3039069 A209.EF064282   93      12      0.531238        -35.1   S       3039069 1089.02 230.2
4623003 A209.EF064282   95      11      0.263011        -33.9   A       4623003 962.656 152.5
1900108 A209.EF064282   97      11      0.29847 -33.6   S       1900108 582.385 95.9
2359581 AAA5.EF064268   53      10      0.459265        -34.3   S       2359581 937.563 279
1134470 AAA5.EF064268   57      11      0.490045        -35.4   S       1134470 1020.77 221.2
2509174 AAA5.EF064268   63      10      0.453891        -34.8   A       2509174 1312.34 218.9
936416  AAA5.EF064268   65      11      0.604597        -35.6   S       936416  1739.83 446
5531538 AAA5.EF064268   67      11      0.613992        -35     A       5531538 1798.19 342.8
219161  AAA5.EF064268   69      12      0.554714        -35.3   S       219161  1244    261.1
712591  AAA5.EF064268   71      10      0.487708        -33.6   A       712591  1389.5  285.4
1680312 AAA5.EF064268   73      9       0.386673        -32     S       1680312 369.067 149.6
4841444 AAA5.EF064268   75      10      0.393024        -32.1   A       4841444 678.501 182.4
944270  AAA5.EF064268   77      10      0.378516        -32.7   S       944270  251.284 51.6
2374708 AAA5.EF064268   79      10      0.435592        -33.2   A       2374708 393.509 95.6
2586120 AAA5.EF064268   81      12      0.517054        -33.9   S       2586120 174.005 33.5
961718  AAA5.EF064268   83      12      0.521711        -34.2   A       961718  130.586 35.2
1308189 AAA5.EF064268   85      12      0.508754        -34.4   S       1308189 198.439 38.5
4235432 AAA5.EF064268   89      11      0.493991        -33.6   S       4235432 143.679 45.5
4419845 AAA5.EF064268   91      12      0.54374 -33.6   A       4419845 94.904  38.7
3574638 AAA5.EF064268   93      11      0.435695        -33.3   S       3574638 128.873 28.9
4513350 AAA5.EF064268   95      11      0.510312        -33.3   A       4513350 94.0746 27.5
1246411 AAA5.EF064268   97      11      0.510547        -33.2   S       1246411 136.292 40.8
2976971 AAA5.EF064268   99      11      0.490249        -33.1   A       2976971 409.191 84.8
1670085 AAA5.EF064268   101     9       0.384792        -32.2   S       1670085 399.819 116
4116324 AAA5.EF064268   103     10      0.430863        -32.6   A       4116324 640.385 168.2
3297170 AAA5.EF064268   105     9       0.357768        -31.7   S       3297170 504.307 103.3
1990703 AAA5.EF064268   107     10      0.414288        -32.3   A       1990703 779.137 241.4






Replies are listed 'Best First'.
Re: Print Uniqe elements of arry of hash
by GrandFather (Saint) on Oct 15, 2008 at 00:45 UTC

    Rewrite your sample code to include some sample data and generate some output that shows the issue so that we have some chance of reproducing your results. The comment above the INPUT1 data indicates that there are 9 columns, there are in fact 10 and the last one (index 9) seems more likely to contain the data you want to test against.

    You can use a __DATA__ section with sample INPUT1 data there. Probably you only need 5 lines of data to demonstrate the issue.


    Perl reduces RSI - it saves typing
      Thanks grandfather,

      I think I confused everyone. As you indicated there are 10 columns and I am working with the forth and the ninth (index 3 and 8). This code does not work unless I remove the comment form "# push(@genes_number, $current_line[ 1 ]);" and comment "push(@{$genes_number{$k}}, $current_line [ 1 ]);"

      Here is the complete code that reads the first file and store the data into the %gc_ag. In the data section I have included the first few rows of first file and the second file.

        This dog won't hunt.

           Global symbol "$k" requires explicit package name at sesemin.pl line 72.
           syntax error at sesemin.pl line 72, near "})"
        
        where, by the time I sorted out the files and such, line 72 is:&count_unique (@{$genes_number{$k});

        Even fixing the trivial syntax error, one's left with the undefined $k at this point. I could guess at what $k is supposed to be here... but I think it's time for you to do some work !

        Happy to try to help, but you need to put more effort in at your end, so that the code you offer:

        1. actually runs -- with strict and warnings.
        2. does not require other files to be downloaded -- let alone placed in special directories. (see below)
        3. does not require command line arguments -- whatever you need to demonstrate should be inside the code.
        4. illustrates the issue you have, with the minimum of code.
        5. supports the question you've posed... "I am trying to achieve blah to which end I have written blather which is supposed to mangle stuff so, but what I get is a crick in my neck...., as demonstrated in the very wonderful code here...
        6. and all the other good advice in How do I post a question effectively?


        If your code only has one input file, then __DATA__ is the obvious replacement.

        If your code has several inputs, then this will do the trick. First, comment out (or remove) the original open commands and replace as illustrated:

        #open FOO, "my_favourite.yum" or die "horribly $!" ; open FOO, '<', &my_favourite_yum or die "horribly $!" ;
        then at the end of your example code place:
        #______________________________________ # ... description of the data ... sub my_favourite_yum { \(<<'~~FILE') } ... contents of my_favourite.yum go here ... ~~FILE #______________________________________ # ... description.... sub my_other_favourite { \(<<'~~FILE') } ... contents of my_other_favourite go here ... ~~FILE #______________________________________
        where:
        1. obviously, the name of the sub must match the name in the relevant open, but may be anything you like that helps the reader.
        2. the end of file marker '~~FILE' can be anything you like, but obviously it must not appear in the data ! (For the avoidance of doubt, it may be different for each file, but must be the same in the two places it appears for each one.)
        3. the marker at the end of each "file" must have a newline at the end of it. (No other trailing whitespace. End-of-file will not do -- hence the suggested #____ line at the end.
        4. NB: tabs may be translated to one or more spaces in the upload process. But ...
        5. before posting, you must ensure that your example code works (and illustrates the issue you have) with the files embedded.

        Update: removed spurious and erroneous & from sub &my_favourite_yum and sub &my_other_favourite. Thanks tinita. (Blushes deep crimson and wonders whether going back to bed and starting again is an option.)