I'm not a biology guy, but when I see strings like TGGACGGAGAACTGATAAGGGT in a file I think DNA/RNA. Have you looked in CPAN? There are lots of modules there specific to things biological. I.E. you may be re-inventing an existing wheel.
Comment on Re: regular expression questions (from someone without experience)