in reply to Re^4: Splitting only on internal pattern, not at start or end of string
in thread Splitting only on internal pattern, not at start or end of string
Waiting for AnomalousMonks expert answer...
"ex" == "formerly" "spurt" == "a drip under pressure"
"expert" == "ex" + "spurt"
"expert" == "formerly a drip under pressure"
I was thinking of something along the lines of johngg's extractive approach:
@ra = $string =~ m{ [^Nn]+ }xmsg
I shied away from [ACGT]+ because the presence of 'N' suggests the presence of other sequence characters (codon sequences? protein sequences? I'm not a bio-guy) than these. However, the problem with [^Nn]+ is it assumes that the input sequences are correct: any junk other than 'N' or 'n' that happens to be present will also be extracted. Also, I share the confusion of others about what should happen to leading and trailing "NNN..." sub-sequences.
|
|---|