in reply to Issues with substr and substituting

The subroutine in the code below will handle sequence strings of varying length and will also allow for placeholder letters in the sequence. It loops over each letter in the sequence and uses a translation hash and the four-argument form of substr in a map.

use strict; use warnings; use feature qw{ say }; say q{-} x 9; say for @{ expand( q{AGCT} ) }; say q{-} x 9; say for @{ expand( q{AXCT} ) }; say q{-} x 9; say for @{ expand( q{CTACAGGAA} ) }; say q{-} x 9; sub expand { my $seq = shift; my %translate = ( A => [ qw{ G C T } ], G => [ qw{ A C T } ], C => [ qw{ A G T } ], T => [ qw{ A G C } ], ); my $raExpands = []; for my $posn ( 0 .. length( $seq ) - 1 ) { my $letter = substr $seq, $posn, 1; next unless $letter =~ m{[AGCT]}; my $copy = $seq; push @{ $raExpands }, map { substr $copy, $posn, 1, $_; $copy; } @{ $translate{ $letter } }; } return $raExpands; }

The output.

--------- GGCT CGCT TGCT AACT ACCT ATCT AGAT AGGT AGTT AGCA AGCG AGCC --------- GXCT CXCT TXCT AXAT AXGT AXTT AXCA AXCG AXCC --------- ATACAGGAA GTACAGGAA TTACAGGAA CAACAGGAA CGACAGGAA CCACAGGAA CTGCAGGAA CTCCAGGAA CTTCAGGAA CTAAAGGAA CTAGAGGAA CTATAGGAA CTACGGGAA CTACCGGAA CTACTGGAA CTACAAGAA CTACACGAA CTACATGAA CTACAGAAA CTACAGCAA CTACAGTAA CTACAGGGA CTACAGGCA CTACAGGTA CTACAGGAG CTACAGGAC CTACAGGAT ---------

I hope this is useful.

Cheers,

JohnGG