in reply to DNA Pattern Matching
I'm confused about what you want (update: please remember this is PerlMonks and not BioMonks), but a part of it seems to be a permutation of all the other middle, single-base possibilities given the one that is actually present. If so, maybe something like this:
Note that the $ppi_pm_id permuted sequence is the same as the "input" $ppi_pm_seq sequence (if that's what you want).c:\@Work\Perl\monks>perl -wMstrict -le "my $ppi_pm_seq = 'ACTGCCT'; ;; my $n = 3; ;; my $extraction = my ($before, $mid, $after) = $ppi_pm_seq =~ m{ \A (.{$n}) (.) (.{$n}) \z }xms; ;; die qq{no extraction from '$ppi_pm_seq'} unless $extraction; ;; my ($ppi_pm_id, $ppi_mm_id, $mpi_pm_id, $mpi_mm_id) = map qq{$before$_$after}, $mid, grep $_ ne $mid, qw(A T C G) ; ;; print qq{'$_'} for $ppi_pm_id, $ppi_mm_id, $mpi_pm_id, $mpi_mm_id; " 'ACTGCCT' 'ACTACCT' 'ACTTCCT' 'ACTCCCT'
Update 1: Note also that the simpler (.{$n}) (.) or (.{n}) (.) expressions suffice if you can guarantee, as an initial step, that your input strings are only A T C G characters. This can be as simple as (tested):
my $s = 'ATCGxTGAC'; my $foreign = $s =~ tr/ATCG//c; print qq{foreign character in '$s'} if $foreign;
Update 2: An interesting side note: If you have Perl version 5.10+ available, you can use the (?-PARNO) relative recursive subpattern regex extension to (perhaps) simplify things a bit (tested):
m{ \A (.{$n}) (.) ((?-3)) \z }xms;
in place of
m{ \A (.{$n}) (.) (.{$n}) \z }xms;
(I rather doubt this alteration would speed things up significantly, but I've not done any testing; it's just a thought.)
Give a man a fish: <%-{-{-{-<
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: DNA Pattern Matching
by Speed_Freak (Sexton) on Jul 20, 2017 at 13:40 UTC | |
by AnomalousMonk (Archbishop) on Jul 20, 2017 at 16:46 UTC | |
by Sinistral (Monsignor) on Jul 20, 2017 at 15:22 UTC | |
by Speed_Freak (Sexton) on Jul 20, 2017 at 15:46 UTC | |
by AnomalousMonk (Archbishop) on Jul 21, 2017 at 18:48 UTC |