in reply to Splitting only on internal pattern, not at start or end of string

All of the split solutions using look-arounds that have been posted so far have problems coping with Ns at the beginning or end of the string. If you want to use split I think the simplest approach would be to combine it with grep and length, splitting on one or more Ns without any look-arounds.

$ perl -E ' > $seq = q{NNACGTNNNACGTNACGTNN}; > say for grep length, split m{N+}, $seq;' ACGT ACGT ACGT $

I hope this is of interest.

Cheers,

JohnGG

  • Comment on Re: Splitting only on internal pattern, not at start or end of string
  • Download Code

Replies are listed 'Best First'.
Re^2: Splitting only on internal pattern, not at start or end of string
by Anonymous Monk on Jan 16, 2014 at 14:27 UTC
    Thanks everyone, this is all very helpful, and very much a great learning experience for me. I see now that indeed the first solution will remove characters I want to keep, so I will update my script as necessary.
Re^2: Splitting only on internal pattern, not at start or end of string
by Anonymous Monk on Jan 16, 2014 at 15:13 UTC

    One wonders if BiologySwede had not intended to not keep the leading/trailing N's, or not?