robertkraus has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks!
I am an early perl freshman. So far I do some text manipualtion in a line-by-line style on plain text documents. What I'd like to parse now is a table that needs to be rearranged so that info from a certain line goes to another place at another line. This line crossing is new to me and I can't get my head solving it...
Here is an example that I wrote to illustrate:
SNP5 IND1 A5 C5 0.8 SNP2 IND1 A2 C2 0.8 SNP1 IND1 A1 C1 0.8 SNP3 IND1 A3 C3 0.8 SNP4 IND1 A4 C4 0.8 SNP5 IND2 G5 T5 0.8 SNP2 IND2 G2 T2 0.8 SNP1 IND2 G1 T1 0.8 SNP3 IND2 G3 T3 0.8 SNP4 IND2 G4 T4 0.8
The last column (all the 0.8s) is not needed. And the first column (SNP1-5) is used to sort the info in columns 3 and 4 to the correct positions (columns 3 and 4 give possible states of SNP per IND). Each combination of IND and SNP is unique in this data. In my output I would like to get a table with a first column giving each IND, followed by the sorted info about its state for each SNP
This table needs to be stored as:
IND1 A1 C1 A2 C2 A3 C3 A4 C4 A5 C5 IND2 G1 T1 G2 T2 G3 T3 G4 T4 G5 T5
So I need to get from a multi-line format that stores the two possible states of IND per SNP, to a one-line format for every IND. Getting the SNPs sorted properly makes it possible to omit the name of the SNPs. They don't need to appear in the output. All I need is the right order (1-5 for each state).
My first strategy was to use hash tables to store each IND-SNP combination as key and each state combination as value. This works, but now I run into problems when I want to arrange the output table.
This is also the first time I do something with hash tables, so I am quite happy that I get my table populated already using this code:
while (<>) { @intarray = split('\t'); $key = $intarray[1].",".$intarray[0]; $value = $intarray[2].",".$intarray[3]; $datahash{$key} = $value; }
I use commas as separators to possibly split the keys and values apart later on. The problem is just I don't know how to go on... I had to learn that I can't use regular expressions in calling values of a certain list of keys (like calling the values from any key matching ^IND1* in a foreach loop)... But also any other starting point to solve this reformatting challenge is highly appreaciated!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Table manipulation, array or hash?
by GrandFather (Saint) on Mar 23, 2010 at 10:08 UTC | |
by robertkraus (Novice) on Mar 23, 2010 at 11:30 UTC | |
|
Re: Table manipulation, array or hash?
by biohisham (Priest) on Mar 23, 2010 at 10:09 UTC | |
by robertkraus (Novice) on Mar 23, 2010 at 11:27 UTC | |
by deMize (Monk) on Mar 23, 2010 at 13:37 UTC | |
|
Re: Table manipulation, array or hash?
by ack (Deacon) on Mar 23, 2010 at 17:36 UTC | |
by GrandFather (Saint) on Mar 23, 2010 at 20:14 UTC |