Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:
First File T G T T T G T C A A ......... C G G C T C T G G C ......... . . . . . . . . . . . . . . . . . Second File T G C T T G T C G A G A G C G A A G G T A G T A G T T C A G T C G C . . . . . . . .
The first file contains all the sequence data. Each row represents one individual, and every two columns represent one SNP marker (two alleles per SNP). There are 2600 rows and 4100 columns in total (2050 SNP markers).
The second file contains all the 'minor' and 'major' alleles for all the markers (minor allele in column 1 and major allele in column 2). The minor allele represents the allele for each SNP marker with the lowest frequency of occurence out of the two possible alleles, and the major allele is the one with the highest frequency. There are 2050 rows in total, which matches the number of total SNP markers in the sample, and each line correlates with a pair of alleles in the first file. Essentially, each pair of alleles in the first file can be any permutation of the combinations between the matching row of minor and major alleles.
The individual is homozygous at a marker if the pair of alleles are the same (ie TT or AA). The individual is heterozygous at a marker if the pair of alleles are different (ie TG or GT). The individual has missing data at a marker if the pair of alleles is '0 0'.
Desired operation: Reading the first file one row at a time and two columns at a time (two alleles at a time), if the pair of alleles is homozygous for the matching minor allele (column 1) in the second file for that marker, then output a '0'. If the pair is heterozygous (a combination of the minor and major alleles), then output a '1'. If the pair of alleles is homozygous for the matching major allele (column 2), then output a '2'. If the pair is missing ('0 0'), then output a -1. The resulting file should have 2600 rows and 2050 columns, representing the total number of individuals and SNP markers, respectively.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Gurus, please point me in the right direction; complicated operations desired for DNA sequence formating
by wind (Priest) on Mar 18, 2011 at 16:02 UTC | |
|
Re: Gurus, please point me in the right direction; complicated operations desired for DNA sequence formating
by educated_foo (Vicar) on Mar 18, 2011 at 16:15 UTC | |
|
Re: Gurus, please point me in the right direction; complicated operations desired for DNA sequence formating
by umasuresh (Hermit) on Mar 18, 2011 at 16:28 UTC | |
|
Re: Gurus, please point me in the right direction; complicated operations desired for DNA sequence formating
by raybies (Chaplain) on Mar 18, 2011 at 16:04 UTC | |
|
Re: Gurus, please point me in the right direction; complicated operations desired for DNA sequence formating
by choroba (Cardinal) on Mar 18, 2011 at 16:07 UTC | |
|
Re: Gurus, please point me in the right direction; complicated operations desired for DNA sequence formating
by CountZero (Bishop) on Mar 19, 2011 at 08:50 UTC |