Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear PerlMonks, I'm trying to match a certain pattern in a DNA sequence and retrieve the position of the match. This wouldn't create a problem, but I don't know how to use the pos() function on an array. I also don't know how to use $- with an unknown amount of matches. This is what I have so far:

@TATA = ($seq =~ m/TATA[A|T][A|T][A|T|G]/i) $number_of_TATA = @TATA; if ($number_of_TATA > 0){ print "The positions of the TATA box are:"; foreach (@TATA){ $TATA_position=pos($TATA)-7; print $TATA_position."\n"; }

I'd greatly appreciate the help! thanks a lot!

Replies are listed 'Best First'.
Re: positions of multiple matches
by AnomalousMonk (Archbishop) on Nov 05, 2011 at 15:31 UTC
    >perl -wMstrict -le "my $seq = 'xxxTATAATGyyyTatAtAtzzz'; my $box = qr{ (?i) TATA [TA] [TA] [TAG] }xms; ;; my @TATAs; while ($seq =~ m{ ($box) }xmsg) { push @TATAs, [ $1, $-[1], ]; } ;; print qq{matched '$_->[0]' at pos $_->[1]} for @TATAs; " matched 'TATAATG' at pos 3 matched 'TatAtAt' at pos 13

    Updates:

    1. Note: The character class definition  [T|A] in the OP is probably not what you want. The  | (pipe) character has no special meaning in a character class; it just represents a literal  | character, so the character class  [T|A] matches a single one of any of three characters: 'T', 'A' or '|'.
    2. Changed example code to use case-insensitive matching per OPed example. Note that in matching long strings, case-insensitive matching may impose a significant performance penalty; it may be better to convert all strings to a common case, then match without case sensitivity.

Re: positions of multiple matches
by mrstlee (Beadle) on Nov 05, 2011 at 19:11 UTC
    Here's how to match and set the pos in one line:
    use re 'eval'; my $seq = 'xxxTATAATGyyyTatAtAtzzz'; my $box = qr{ (?i) TATA [TA] [TA] [TAG] }xms; ;; my @TATAs; my @matches = $seq =~ m{ ($box) (?{ push @TATAs,[$+,pos() - length $+] }) }msxg; print "matched '$_->[0]' at pos $_->[1]\n" for @TATAs;
    The 'magic' is in the (?{ ... }) construct. This allows you to insert perl into your regexen.
Re: positions of multiple matches
by Anonymous Monk on Nov 05, 2011 at 15:42 UTC

    thanks a lot!