Here is a head from my input file, and the columns are the following information: ID, strand, chromosome, start, sequence, quality score, and positions in the genome. The last two are unnecessary for what I need, so the script is only defining strand, chromosome, start, and length of sequence to find the end. I use these to then parse through the reference file to grab the info in the last column of the reference, and append most of the info from the original input.
I chose this header as it has some of info I hope to overlook, such as chromosome missing in the reference (line 1) and different sites on the same chromosome (lines 3 and 4).
3-51568 + HSV1_17 9285 TGGGCAAACACTTGGGGACTG IIIIIIIIII
+IIIIIIIIIII 0
2-70337 + KI270733.1 135235 TCGCTGCGATCTATTGAAAGTCAGCCCTCG
+ACACAAGGGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4
+
2-70337 + 21 8446166 TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAG
+GGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4
2-70337 + 21 8218896 TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAG
+GGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4
2-70337 + GL000220.1 118372 TCGCTGCGATCTATTGAAAGTCAGCCCTCG
+ACACAAGGGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4
+
2-70337 + 21 8401935 TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAG
+GGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4
1-130983 + 2 32916254 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII
+IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5
1-130983 + 2 32916255 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII
+IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5
1-130983 + 2 32916256 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII
+IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5
1-130983 + 2 32916257 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII
+IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5
|