Hi, I'm a beginner to perl programming. I need to create a script for removing the nucleotides from many sequences. My data looks something like this
@HWI-.blah blah.......................:TGACCA
GTAGGGGCTGCGCGAACGCAAACCCCCGCTGCCACAAATGATCGTCGGACTGTAGAA
CTCTGAACGTGTAGATCTCGGTGGCCGCCGTATCATTAAAAAAA
+
?1=.....blah blah......................................>(:@@CB+8(9>@:@CCBB289(259@B9B8?A:@C@>CC@Bthis is like one set, there are many sets like this in the file. so if i want to remove the last 5 "a" frm the sequence, and its corresponding quality (>CC@B) and do this for all the sequences, how do i go about it. First I thought i should split it into arrays using the '+' but then i will have to remove the last five elements of each element of the array. and join them and resplit them differently so that the next time i can remove the last 5"quality" data from each element of the array. I'm sure there's a less complicated procedure..can anyone help mme out here please?
@HWI-ST1023:184:C1V8LACXX:7:1101:1142:2247 2:N:0:TGACCA GTAGGGGCTGCGCGAACGCAAACCCCCGCTGCCACAAATGATCGTCGGACTGTAGAACTCTGAACGTGTA +GATCTCGGTGGCCGCCGTATCATTAAAAAAA + ?1=DBB@DCFFFFIGIIII6DGHHIII6@=AEEDDEEC;@C>@?(;;B;@B?9BCDAA3>(:@@CB+8(9 +>@:@CCBB289(259@B9B8?A:@C@>CC@B @HWI-ST1023:184:C1V8LACXX:7:1101:1450:2022 2:N:0:TGACCA ACGTGCCCTCGGCCAGAAGGCTTGGGGCGCAACTTGCGTTCAAAGACTCGATGGTTCACGGGATTCTGCA +ATTCACACCAAGTATCGCATTTCGCTACGTT + ?@@DDDFFADFFHIJIIFG>FHIJJJJJGIIBH=DHGHHDDFFF; AEAC?=>CD-:@CDBDBDBDD>CDDD:ACDCDDDDD?(4>CBBD?@DDDDDDDD8? @HWI-ST1023:184:C1V8LACXX:7:1101:1457:2047 2:N:0:TGACCA GCGTCGCCAGCACAGAGGCCATGCGATCCGTCGAGTTATCATGAATCATCAGAGCAACGGGCAGAGCCCG +CGTCGACCTTTTATCTAATAAATGCGTCCCT + @CCDFFFFGHHHHJIIIJJIJJJJIIJJJJFHIBFBFHIGJJIGI@GHGGEHHHHHHFFDDABDDDDDDD +DDDDBDBBBDCCCCCDDDDCDDEECB8<@DD
sorry if I framed my question wrong
So i need to remove the last 5 Nucleotides from each sequence, irrespective of whether its an "a" or not, sorry if i said so otherwise.
Also i need to remove the corresponding quality of the nucleotides which are basically the symbol like characters.Like in the first sequence if I'm removing "AAAAA" i need to also remove ">CC@B".
is it doable? :(
In reply to Removing nucleotide frm sequence by bingalee
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |