I think the documentation is a little misleading here. At least, it gives me the impression that the first match (if any) is somehow guaranteed to be valid (because codon-aligned). But that’s true only if, as in the example given, the $dna string happens to contain a valid match somewhere — in which case, it will be found first. But if it doesn’t, the first match is an invalid one:
#! perl
use strict;
use warnings;
while (my $dna = <DATA>)
{
chomp $dna;
print "\n\$dna = '$dna'\n";
while ($dna =~ /(\w\w\w)*?TGA/g)
{
print 'Got a TGA stop codon at position ', pos $dna,
', immediately following [', $1, "]\n";
}
}
__DATA__
ATCGTTGAA
ATCGTTGAATGCAAATGACATGAC
Output:
0:10 >perl 1476_SoPW.pl
$dna = 'ATCGTTGAA'
Got a TGA stop codon at position 8, immediately following [CGT]
$dna = 'ATCGTTGAATGCAAATGACATGAC'
Got a TGA stop codon at position 18, immediately following [AAA]
Use of uninitialized value $1 in print at 1476_SoPW.pl line 43, <DATA>
+ line 2.
Got a TGA stop codon at position 23, immediately following []
0:10 >
Adding a \G anchor to the regex:
while ($dna =~ /\G(\w\w\w)*?TGA/g)
fixes the results for both dna strings, because \G means Match only at pos() (e.g. at the end-of-match position of prior m//g) (see “Assertions” in perlre), and initially pos() is set at zero.
<Begin update> choroba is of course correct, anchoring to the start of the string finds only the first match.
But that means that the regex could also be fixed without recourse to \G, by simply anchoring it to the start of the string:
while ($dna =~ /^(\w\w\w)*?TGA/g)
<End update>
Perhaps not Perl documentation’s finest hour. :-)
Hope that helps,
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.