Also, is it possible to find the position where we delete the character?file number 1 - seq.txt >s1 AGCTTTTCGGGCAAT >s2 GCTGCCCCCCATCTT >s3 TCGTAGCTGAAAATC file number 2 - num.txt >s1 23 43 45 65 76 54 3 34 54 65 7 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 8 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 things to be done. 1) The sequence file and number file contains same number of enteries with same ID. 2) The number of nucleotides in a sequence in as same as the score of each nucleotide in the num.txt file.that is for example in seq.txt file, >s1 has 15 nucleotides, and in num.txt, >s1 has 15 scores for each corresponding nucleotide in the other file. 3) first thing to be checked is, if there is a repetition in the the bases(nucleotides). for this we can check only seq.txt file.for example: >s1 has 4 "T" and 3 "G".....>s2 has 6 "C"...... and >s3 has 4 "A"..... 4) we have to consider it as repeats only if the bases are continously 3 or more times. for example, in >s1 there is a 2 times "A" near the end of the sequence.....in >s2 +there is a 2 times "T" at the end...these should not be taken as repeated bases. 5) Once we choose the repeated bases according to point number (step 3), we have to take the corresponding ID in the num.txt file. 6) Example: -open seq.txt and open num.txt -process >s1 in both the files. -positions 4-7 is the repeat... "TTTT" in this case. -the corresponding scores in position 4-7 is "65 76 54 3" -there fore its T T T T 65 76 54 3 -we have to check only the last base SCORE. -If it is less than 10, we have to delete the score and nucleotide. -hence the result sequence should be >s1 AGCTTTCGGGCAAT >s1 23 43 45 65 76 54 34 54 65 7 45 56 87 56 This has to be done for all the sequences in the file. >s3 wil not have any change coz, the last base of the repeat is more than 10. ----------------------------------------------------------- sooooooo, my SAMPLE RESULT FILE for the above input should look like: file number 1 - seq.txt >s1 AGCTTTCGGCAAT >s2 GCTGCCCCCATCTT >s3 TCGTAGCTGAAAATC file number 2 - num.txt >s1 23 43 45 65 76 54 34 54 65 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 ----------------------------------------------
In reply to deleting a particular character and its coresponding score in 2 different files!!! by heidi
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |