Like choroba, I'm wondering: What's supposed to happen to the dash in the 4th position in the second string?
    A-C-G--CTGGC
       ^ dash in 4th position

Assuming it should be replaced by  $tag because it's between the quantified groups of bases, here's a multi-regex solution. (Warning: Needs Perl version 5.10+ for the  \K regex operator — but I can get around that fairly easily if needed.)

c:\@Work\Perl>perl -wMstrict -le "use 5.010; ;; use Test::More 'no_plan'; use Test::NoWarnings; ;; my $tag = '___'; ;; VECTOR: for my $ar_vector ( [ qw(ATCGGATCTGGC AT___CGGA___TCTGGC) ], [ qw(A-C-G--CTGGC A-C___G--CTG___GC) ], ) { if (! ref $ar_vector) { note $ar_vector; next VECTOR; } ;; my ($seq, $expected) = @$ar_vector; my $got = xform($seq); is $got, $expected, qq{'$seq' -> '$expected'}; } ;; done_testing; ;; sub xform { my ($s) = @_; ;; my $u = qr{ [ATGC] -*? }xms; ;; $s =~ s{ $u{2} \K -* }{$tag}xms; $s =~ s{ $u{4} \K -* }{$tag}xms; return $s; } " ok 1 - 'ATCGGATCTGGC' -> 'AT___CGGA___TCTGGC' ok 2 - 'A-C-G--CTGGC' -> 'A-C___G--CTG___GC' 1..2 ok 3 - no warnings 1..3
Of course, more test cases are highly encouraged!

Update: And yes, this does seem like an XY Problem.

Update 2: Here's the pre-5.10 (no \K) version of the code (tested):
    $s =~ s{ ($bu{2}) -* }{$1$tag}xms;
    $s =~ s{ ($bu{4}) -* }{$1$tag}xms;
And versions, also tested, consolidating the two substitutions in a for-loop:
    $s =~ s{  (?:$bu){$_} \K -* }   {$tag}xms for 2, 4;  # 5.10+
    $s =~ s{ ((?:$bu){$_})   -* } {$1$tag}xms for 2, 4;  # pre-5.10
In all these variations,
    my $bu = qr{ [ATGC] -*? }xms;


Give a man a fish:  <%-{-{-{-<


In reply to Re: Regex to match range of characters broken by dashes by AnomalousMonk
in thread Regex to match range of characters broken by dashes by Q.and

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.