in reply to Splitting only on internal pattern, not at start or end of string

G'day BiologySwede,

Welcome to the monastery.

There's some issues with what you've posted:

The following code eliminates the need for an interim %sequence hash, requires no regex for split and reduces your code substantially (all processing occurs in a single statement). Also note that I've added some additional test data.

#!/usr/bin/env perl -l use strict; use warnings; /^[^>]/ && do { y/N/ /; print join "\n" => split } while <DATA>; __DATA__ >fasta1 NNNAGTCTGCAAANAATTTGCGGCTCACAAT >fasta2 CGCAGCCATTAACATCTCAACAAGCCAAAAATTCCTTCTCAGAAATTCGGNNN >mytest1 NNNACGTNNTGCANN >mytest2 ACGTNNCGTANNNNNGTACNTACG >mytest3 TGCA

Output:

AGTCTGCAAA AATTTGCGGCTCACAAT CGCAGCCATTAACATCTCAACAAGCCAAAAATTCCTTCTCAGAAATTCGG ACGT TGCA ACGT CGTA GTAC TACG TGCA

Here's some additional tips regarding the code you posted:

-- Ken