Dear all, I want to delete a particular character in a string of one file, based on the corresponding scores in a different file. To explain the query in detail......
file number 1 - seq.txt >s1 AGCTTTTCGGGCAAT >s2 GCTGCCCCCCATCTT >s3 TCGTAGCTGAAAATC file number 2 - num.txt >s1 23 43 45 65 76 54 3 34 54 65 7 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 8 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 things to be done. 1) The sequence file and number file contains same number of enteries with same ID. 2) The number of nucleotides in a sequence in as same as the score of each nucleotide in the num.txt file.that is for example in seq.txt file, >s1 has 15 nucleotides, and in num.txt, >s1 has 15 scores for each corresponding nucleotide in the other file. 3) first thing to be checked is, if there is a repetition in the the bases(nucleotides). for this we can check only seq.txt file.for example: >s1 has 4 "T" and 3 "G".....>s2 has 6 "C"...... and >s3 has 4 "A"..... 4) we have to consider it as repeats only if the bases are continously 3 or more times. for example, in >s1 there is a 2 times "A" near the end of the sequence.....in >s2 +there is a 2 times "T" at the end...these should not be taken as repeated bases. 5) Once we choose the repeated bases according to point number (step 3), we have to take the corresponding ID in the num.txt file. 6) Example: -open seq.txt and open num.txt -process >s1 in both the files. -positions 4-7 is the repeat... "TTTT" in this case. -the corresponding scores in position 4-7 is "65 76 54 3" -there fore its T T T T 65 76 54 3 -we have to check only the last base SCORE. -If it is less than 10, we have to delete the score and nucleotide. -hence the result sequence should be >s1 AGCTTTCGGGCAAT >s1 23 43 45 65 76 54 34 54 65 7 45 56 87 56 This has to be done for all the sequences in the file. >s3 wil not have any change coz, the last base of the repeat is more than 10. ----------------------------------------------------------- sooooooo, my SAMPLE RESULT FILE for the above input should look like: file number 1 - seq.txt >s1 AGCTTTCGGCAAT >s2 GCTGCCCCCATCTT >s3 TCGTAGCTGAAAATC file number 2 - num.txt >s1 23 43 45 65 76 54 34 54 65 45 56 87 56 >s2 23 43 23 45 65 45 76 78 34 12 32 65 23 25 >s3 12 23 34 45 56 54 43 32 65 43 12 34 75 76 45 ----------------------------------------------
Also, is it possible to find the position where we delete the character?

In reply to deleting a particular character and its coresponding score in 2 different files!!! by heidi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.