A fly in the ointment was in the code for our $s (@stops). The code wouldn't work here with a 'my' declaration. 'our' was necessary.
This is fairly readable and should work for any number of groups (provided they don't exceed the count of fasta characters in a string of them. I didn't test for that to see how it behaved).
The dynamic regex form was necessary because the count of the quantifier changed for each iteration of the 'for' loop ($s-1).
The printout after the __END__ token shows the results of the run.
Update: Added a final substitution to remove dashes preceding and following the double underscore (as he desired in his post. Missed that.)
#!/usr/bin/perl use strict; use warnings; use 5.014; my @stops = (2,6); # group by 2 then 4 (6 == 2 + 4) my $tag = '___'; for ('ATCGGATCTGGC', 'A-C-G--CTGGC') { my $seq = $_; for our $s (@stops) { # necessary to use 'our' instead of 'my' $seq =~ s/ ( # begin capture (??{ # dynamic regex "(?:[TAGC][^TAGC]*)" . # group to apply quantifi +er to "{" . ($s-1) . "}" . # quantifier "[TAGC]" # end token }) # end dynamic reference ) # end capture /$1$tag/x; # end of substitution } $seq =~ s/__-+/__/g; say $seq; } __END__ C:\Old_Data\perlp>perl dynamic_regex.pl AT___CGGA___TCTGGC A-C___G--CTG___GC C:\Old_Data\perlp>
In reply to Re: Regex to match range of characters broken by dashes
by Cristoforo
in thread Regex to match range of characters broken by dashes
by Q.and
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |