Hi,
Hopefully this does the trick for you, using negative lookahead to make sure each character doesn't repeat before the comma (or end of string):
my @tst = qw( A,G AG,CT TC,CA GAT,CGA CGAT,TG ,G
ACGT X,A AA,G AC,GGC ATGA,TGG ATCXG,AAC
);
for (@tst) {
my $side = qr/(?:([ACGT])(?![^,]*\g{-1}))+/;
print $_ . (/^$side,$side$/ ? ' good' : ' bad') . $/;
}
Prints:
A,G good
AG,CT good
TC,CA good
GAT,CGA good
CGAT,TG good
,G bad
ACGT bad
X,A bad
AA,G bad
AC,GGC bad
ATGA,TGG bad
ATCXG,AAC bad
Oops, more test coverage showed an issue. Needed to use relative group instead of "\1" as originally posted which had a problem with "AC,AC". Fixed.