in reply to Re: How to make a hash to evaluate columns between large datasets
in thread How to make a hash to evaluate columns between large datasets
Here is a head from my input file, and the columns are the following information: ID, strand, chromosome, start, sequence, quality score, and positions in the genome. The last two are unnecessary for what I need, so the script is only defining strand, chromosome, start, and length of sequence to find the end. I use these to then parse through the reference file to grab the info in the last column of the reference, and append most of the info from the original input.
I chose this header as it has some of info I hope to overlook, such as chromosome missing in the reference (line 1) and different sites on the same chromosome (lines 3 and 4).
3-51568 + HSV1_17 9285 TGGGCAAACACTTGGGGACTG IIIIIIIIII +IIIIIIIIIII 0 2-70337 + KI270733.1 135235 TCGCTGCGATCTATTGAAAGTCAGCCCTCG +ACACAAGGGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4 + 2-70337 + 21 8446166 TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAG +GGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4 2-70337 + 21 8218896 TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAG +GGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4 2-70337 + GL000220.1 118372 TCGCTGCGATCTATTGAAAGTCAGCCCTCG +ACACAAGGGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4 + 2-70337 + 21 8401935 TCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAG +GGTTTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 4 1-130983 + 2 32916254 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII +IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5 1-130983 + 2 32916255 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII +IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5 1-130983 + 2 32916256 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII +IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5 1-130983 + 2 32916257 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG +GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG IIIIIIIIIIIIIIIIIIIIIIIII +IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5
|
---|