comment on

Dear all, I want to delete a particular character in a string of one file, based on the corresponding scores in a different file. To explain the query in detail......

file number 1 - seq.txt
>s1
AGCTTTTCGGGCAAT
>s2
GCTGCCCCCCATCTT
>s3
TCGTAGCTGAAAATC

file number 2 - num.txt
>s1
23 43 45 65 76 54 3 34 54 65 7 45 56 87 56 
>s2
23 43 23 45 65 45 76 78 34 8 12 32 65 23 25
>s3
12 23 34 45 56 54 43 32 65 43 12 34 75 76 45


things to be done.

1) The sequence file and number file contains 
same number of enteries with same ID.

2) The number of nucleotides in a sequence in 
as same as the score of each nucleotide in the 
num.txt file.that is for example in seq.txt file, 
>s1 has 15 nucleotides, and in num.txt, >s1 has
15 scores for each corresponding nucleotide in 
the other file.

3) first thing to be checked is, if there is a 
repetition in the the bases(nucleotides). for 
this we can check only seq.txt file.for example: 
>s1 has 4 "T"  and 3 "G".....>s2 has 6 "C"......
and >s3 has 4 "A"..... 

4) we have to consider it as repeats only if the 
bases are continously 3 or more times. for example, 
in >s1 there is a 2 times "A" near the end of the sequence.....in >s2 
+there is a 2 times "T" at the 
end...these should not be taken as repeated bases. 

5) Once we choose the repeated bases according to 
point number (step 3), we have to take the 
corresponding ID in the num.txt file.

6) Example: 
-open seq.txt and open num.txt

-process >s1 in both the files.

-positions 4-7 is the repeat... "TTTT" 
in this case.

-the corresponding scores in position 4-7 
is "65 76 54 3"

-there fore its T   T   T   T
                65  76  54  3

-we have to check only the last base SCORE.

-If it is less than 10, we have to delete the score and nucleotide.

-hence the result sequence should be 


>s1
AGCTTTCGGGCAAT

>s1
23 43 45 65 76 54 34 54 65 7 45 56 87 56 

This has to be done for all the sequences 
in the file. >s3 wil not have any change coz, 
the last base of the repeat is more than 10. 
-----------------------------------------------------------
sooooooo, my SAMPLE RESULT FILE for 
the above input should look like:

file number 1 - seq.txt
>s1
AGCTTTCGGCAAT
>s2
GCTGCCCCCATCTT
>s3
TCGTAGCTGAAAATC

file number 2 - num.txt
>s1
23 43 45 65 76 54 34 54 65 45 56 87 56 
>s2
23 43 23 45 65 45 76 78 34 12 32 65 23 25
>s3
12 23 34 45 56 54 43 32 65 43 12 34 75 76 45
----------------------------------------------
[download]

Also, is it possible to find the position where we delete the character?

In reply to deleting a particular character and its coresponding score in 2 different files!!! by heidi

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.