You escape the closing brace but not the opening brace in your regexp. You should escape both
Also there is a '\n' between coding sequence and protein sequence you seem to have overlooked. But since you don't look for lines anyway it might make sense to do a 'chomp @array' before you join the lines (instead of adapting the regex).
You might also use non-greedy .*? instead of .* to make your regex a bit faster.
Note that the whole string in parenthesis is captured in $1,$2.... If you only want the coding and protein sequences without the text surrounding it you have to shrink the parenthesis to sit just around the .*
General advice: Please add at least 'use warnings;' to your code, and 'use strict;' is recommended too
In reply to Re: extraction of sequences
by jethro
in thread extraction of sequences
by patric
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |