Hi Friends,

I have files with columns like this. This sample input below is partial.

Please check below for main file link. Each file will have only two rows.

Gene 0.4% 0.7% 1.1% 1.4% 1.8% 2.2% 2.5% 2.9% + 3.3% 3.6% 4.0% 4.3% 4.7% 5.1% 5.4% 5.8% 6.2 +% 6.5% 6.9% 7.2% 7.6% 8.0% 8.3% 8.7% 9.1% +9.4% 9.8% 10.1% 10.5% 10.9% 11.2% 11.6% 12.0% + 12.3% 12.7% 13.0% 13.4% 13.8% 14.1% 14.5% 14.9% + 15.2% 15.6% 15.9% 16.3% 16.7% 17.0% 17.4% 17 +.8% 18.1% 18.5% 18.8% 19.2% 19.6% 19.9% 20.3% + 20.7% 21.0% 21.4% 21.7% 22.1% 22.5% 22.8% 23.2% + 23.6% 23.9% 24.3% 24.6% 25.0% 25.4% 25.7% 26 +.1% 26.4% 26.8% 27.2% 27.5% 27.9% 28.3% 28.6% + 29.0% 29.3% 29.7% 30.1% 30.4% 30.8% 31.2% 31.5% + 31.9% 32.2% 32.6% 33.0% 33.3% 33.7% 34.1% 34 +.4% 34.8% 35.1% 35.5% 35.9% 36.2% 36.6% 37.0% + 37.3% 37.7% 38.0% 38.4% 38.8% 39.1% 39.5% 39.9% + 40.2% 40.6% 40.9% 41.3% 41.7% 42.0% 42.4% 42 +.8% 43.1% 43.5% 43.8% 44.2% 44.6% 44.9% 45.3% + 45.7% 46.0% 46.4% 46.7% 47.1% 47.5% 47.8% 48.2% + 48.6% 48.9% 49.3% 49.6% 50.0% 50.4% 50.7% 51 +.1% 51.4% 51.8% 52.2% 52.5% 52.9% 53.3% 53.6% + 54.0% 54.3% 54.7% 55.1% 55.4% 55.8% 56.2% 56.5% + 56.9% 57.2% 57.6% 58.0% 58.3% 58.7% 59.1% 59 +.4% 59.8% 60.1% 60.5% 60.9% 61.2% 61.6% 62.0% + 62.3% 62.7% 63.0% 63.4% 63.8% 64.1% 64.5% 64.9% + 65.2% 65.6% 65.9% 66.3% 66.7% 67.0% 67.4% 67 +.8% 68.1% 68.5% 68.8% 69.2% 69.6% 69.9% 70.3% + 70.7% 71.0% 71.4% 71.7% 72.1% 72.5% 72.8% 73.2% + 73.6% 73.9% 74.3% 74.6% 75.0% 75.4% 75.7% 76 +.1% 76.4% 76.8% 77.2% 77.5% 77.9% 78.3% 78.6% + 79.0% 79.3% 79.7% 80.1% 80.4% 80.8% 81.2% 81.5% + 81.9% 82.2% 82.6% 83.0% 83.3% 83.7% 84.1% 84 +.4% 84.8% 85.1% 85.5% 85.9% 86.2% 86.6% 87.0% + 87.3% 87.7% 88.0% 88.4% 88.8% 89.1% 89.5% 89.9% + 90.2% 90.6% 90.9% 91.3% 91.7% 92.0% 92.4% 92 +.8% 93.1% 93.5% 93.8% 94.2% 94.6% 94.9% 95.3% + 95.7% 96.0% 96.4% 96.7% 97.1% 97.5% 97.8% 98.2% + 98.6% 98.9% 99.3% 99.6% 100.0% 0.4% 0.7% 1.1 +% 1.4% 1.8% 2.2% 2.5% 2.9% 3.3% 3.6% 4.0% +4.3% 4.7% 5.1% 5.4% 5.8% 6.2% 6.5% 6.9% 7.2% + 7.6% 8.0% 8.3% 8.7% 9.1% 9.4% 9.8% 10.1% 1 +0.5% 10.9% 11.2% 11.6% 12.0% 12.3% 12.7% 13.0% + 13.4% 13.8% 14.1% 14.5% 14.9% 15.2% 15.6% 15.9 +% 16.3% 16.7% 17.0% 17.4% 17.8% 18.1% 18.5% 1 +8.8% 19.2% 19.6% 19.9% 20.3% 20.7% 21.0% 21.4% + 21.7% 22.1% 22.5% 22.8% 23.2% 23.6% 23.9% 24.3 +% 24.6% 25.0% 25.4% 25.7% 26.1% 26.4% 26.8% 2 +7.2% 27.5% 27.9% 28.3% 28.6% 29.0% 29.3% 29.7% + 30.1% 30.4% 30.8% 31.2% 31.5% 31.9% 32.2% 32.6 +% 33.0% 33.3% 33.7% 34.1% 34.4% 34.8% 35.1% 3 +5.5% 35.9% 36.2% 36.6% 37.0% 37.3% 37.7% 38.0% + 38.4% 38.8% 39.1% 39.5% 39.9% 40.2% 40.6% 40.9 +% 41.3% 41.7% 42.0% 42.4% 42.8% 43.1% 43.5% 4 +3.8% 44.2% 44.6% 44.9% 45.3% 45.7% 46.0% 46.4% + 46.7% 47.1% 47.5% 47.8% 48.2% 48.6% 48.9% 49.3 +% 49.6% 50.0% 50.4% 50.7% 51.1% 51.4% 51.8% 5 +2.2% 52.5% 52.9% 53.3% 53.6% 54.0% 54.3% 54.7% + 55.1% 55.4% 55.8% 56.2% 56.5% 56.9% 57.2% 57.6 +% 58.0% 58.3% 58.7% 59.1% 59.4% 59.8% 60.1% 6 +0.5% 60.9% 61.2% 61.6% 62.0% 62.3% 62.7% 63.0% + 63.4% 63.8% 64.1% 64.5% 64.9% 65.2% 65.6% 65.9 +% 66.3% 66.7% 67.0% 67.4% 67.8% 68.1% 68.5% 6 +8.8% 69.2% 69.6% 69.9% 70.3% 70.7% 71.0% 71.4% + 71.7% 72.1% 72.5% 72.8% 73.2% 73.6% 73.9% 74.3 +% 74.6% 75.0% 75.4% 75.7% 76.1% 76.4% 76.8% 7 +7.2% 77.5% 77.9% 78.3% 78.6% 79.0% 79.3% 79.7% + 80.1% 80.4% 80.8% 81.2% 81.5% 81.9% 82.2% 82.6 +% 83.0% 83.3% 83.7% 84.1% 84.4% 84.8% 85.1% 8 +5.5% 85.9% 86.2% 86.6% 87.0% 87.3% 87.7% 88.0% + 88.4% 88.8% 89.1% 89.5% 89.9% 90.2% 90.6% 90.9 +% 91.3% 91.7% 92.0% 92.4% 92.8% 93.1% 93.5% 9 +3.8% 94.2% 94.6% 94.9% 95.3% 95.7% 96.0% 96.4% + 96.7% 97.1% 97.5% 97.8% 98.2% 98.6% 98.9% 99.3 +% 99.6% 100.0%

Basically, here is what I need to be done.

a. Start from second column which is 0.4% here. b. Go until you hit "10" in the header name. If the header name is exactly 10.0%, then include that column too. If not, only include until the column before it. In this example, since we have 10.1% (29th column), we will be including columns starting from 0.4%(second) until 9.8% which is the 28th column. If the 29th column was to be 10.0%, then it would have been included too. c. Average the values for these respective columns in the second row (data is not presented here - please click this link for total dataset - https://goo.gl/W8jND7). In this example, starting from 0.4%(second column) till 9.8%(28th column). d. In the output, print first column which is "Gene", and this average value with column header being

Gene Average_10%

e. Then start from 10.1% (29th column) and check until you hit "20" in the header name. Repeat steps b through d. And print output as

Gene Average_10% Average_20%

Repeat this until you have

Gene Average_10% Average_20% Average_30% Average_40% Average_50% Average_60% Average_70% Average_80% Average_90% Average_100%

f. After you hit 100%, it means one dataset is done.

g. If you observe my column header carefully here, there is another 0.4%-100% columns after the first 100%. I will be having 13 of these 0.4%-100%s in the input file at the above link.

i. I have multiple files, the headers can be

1% 2% 3%....100% 1.5% 2.5% 3.5%....100%

It varies from file to file. But the logic of averaging(if you hit "10", "20", etc) is always the same. And the number of samples 13 is also same which means each file will have 100%s for 13 times.


In reply to Average columns based on header name by jacobs.smith

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.