Dear Monks!

I am an early perl freshman. So far I do some text manipualtion in a line-by-line style on plain text documents. What I'd like to parse now is a table that needs to be rearranged so that info from a certain line goes to another place at another line. This line crossing is new to me and I can't get my head solving it...

Here is an example that I wrote to illustrate:

SNP5 IND1 A5 C5 0.8 SNP2 IND1 A2 C2 0.8 SNP1 IND1 A1 C1 0.8 SNP3 IND1 A3 C3 0.8 SNP4 IND1 A4 C4 0.8 SNP5 IND2 G5 T5 0.8 SNP2 IND2 G2 T2 0.8 SNP1 IND2 G1 T1 0.8 SNP3 IND2 G3 T3 0.8 SNP4 IND2 G4 T4 0.8

The last column (all the 0.8s) is not needed. And the first column (SNP1-5) is used to sort the info in columns 3 and 4 to the correct positions (columns 3 and 4 give possible states of SNP per IND). Each combination of IND and SNP is unique in this data. In my output I would like to get a table with a first column giving each IND, followed by the sorted info about its state for each SNP

This table needs to be stored as:

IND1 A1 C1 A2 C2 A3 C3 A4 C4 A5 C5 IND2 G1 T1 G2 T2 G3 T3 G4 T4 G5 T5

So I need to get from a multi-line format that stores the two possible states of IND per SNP, to a one-line format for every IND. Getting the SNPs sorted properly makes it possible to omit the name of the SNPs. They don't need to appear in the output. All I need is the right order (1-5 for each state).

My first strategy was to use hash tables to store each IND-SNP combination as key and each state combination as value. This works, but now I run into problems when I want to arrange the output table.

This is also the first time I do something with hash tables, so I am quite happy that I get my table populated already using this code:

while (<>) { @intarray = split('\t'); $key = $intarray[1].",".$intarray[0]; $value = $intarray[2].",".$intarray[3]; $datahash{$key} = $value; }

I use commas as separators to possibly split the keys and values apart later on. The problem is just I don't know how to go on... I had to learn that I can't use regular expressions in calling values of a certain list of keys (like calling the values from any key matching ^IND1* in a foreach loop)... But also any other starting point to solve this reformatting challenge is highly appreaciated!


In reply to Table manipulation, array or hash? by robertkraus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.