heidi has asked for the wisdom of the Perl Monks concerning the following question:
Also, is it possible to find the position where we delete the character?file number 1 - seq.txt >s1 AGCTTTTCGGGCAAT >s2 GCTGCCCCCCATCTT >s3 TCGTAGCTGAAAATC file number 2 - num.txt >s1 23 43 45 65 76 54 3 34 54 65 7 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 8 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 things to be done. 1) The sequence file and number file contains same number of enteries with same ID. 2) The number of nucleotides in a sequence in as same as the score of each nucleotide in the num.txt file.that is for example in seq.txt file, >s1 has 15 nucleotides, and in num.txt, >s1 has 15 scores for each corresponding nucleotide in the other file. 3) first thing to be checked is, if there is a repetition in the the bases(nucleotides). for this we can check only seq.txt file.for example: >s1 has 4 "T" and 3 "G".....>s2 has 6 "C"...... and >s3 has 4 "A"..... 4) we have to consider it as repeats only if the bases are continously 3 or more times. for example, in >s1 there is a 2 times "A" near the end of the sequence.....in >s2 +there is a 2 times "T" at the end...these should not be taken as repeated bases. 5) Once we choose the repeated bases according to point number (step 3), we have to take the corresponding ID in the num.txt file. 6) Example: -open seq.txt and open num.txt -process >s1 in both the files. -positions 4-7 is the repeat... "TTTT" in this case. -the corresponding scores in position 4-7 is "65 76 54 3" -there fore its T T T T 65 76 54 3 -we have to check only the last base SCORE. -If it is less than 10, we have to delete the score and nucleotide. -hence the result sequence should be >s1 AGCTTTCGGGCAAT >s1 23 43 45 65 76 54 34 54 65 7 45 56 87 56 This has to be done for all the sequences in the file. >s3 wil not have any change coz, the last base of the repeat is more than 10. ----------------------------------------------------------- sooooooo, my SAMPLE RESULT FILE for the above input should look like: file number 1 - seq.txt >s1 AGCTTTCGGCAAT >s2 GCTGCCCCCATCTT >s3 TCGTAGCTGAAAATC file number 2 - num.txt >s1 23 43 45 65 76 54 34 54 65 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 ----------------------------------------------
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: deleting a particular character and its coresponding score in 2 different files!!!
by ikegami (Patriarch) on Oct 14, 2008 at 05:10 UTC | |
by heidi (Sexton) on Oct 14, 2008 at 08:26 UTC | |
by GrandFather (Saint) on Oct 14, 2008 at 10:26 UTC | |
by apl (Monsignor) on Oct 14, 2008 at 10:00 UTC | |
|
Re: deleting a particular character and its coresponding score in 2 different files!!!
by gone2015 (Deacon) on Oct 14, 2008 at 14:05 UTC | |
by Anonymous Monk on Oct 15, 2008 at 06:21 UTC | |
by heidi (Sexton) on Oct 15, 2008 at 06:26 UTC |