I think the documentation is a little misleading here. At least, it gives me the impression that the first match (if any) is somehow guaranteed to be valid (because codon-aligned). But that’s true only if, as in the example given, the $dna string happens to contain a valid match somewhere — in which case, it will be found first. But if it doesn’t, the first match is an invalid one:

#! perl use strict; use warnings; while (my $dna = <DATA>) { chomp $dna; print "\n\$dna = '$dna'\n"; while ($dna =~ /(\w\w\w)*?TGA/g) { print 'Got a TGA stop codon at position ', pos $dna, ', immediately following [', $1, "]\n"; } } __DATA__ ATCGTTGAA ATCGTTGAATGCAAATGACATGAC

Output:

0:10 >perl 1476_SoPW.pl $dna = 'ATCGTTGAA' Got a TGA stop codon at position 8, immediately following [CGT] $dna = 'ATCGTTGAATGCAAATGACATGAC' Got a TGA stop codon at position 18, immediately following [AAA] Use of uninitialized value $1 in print at 1476_SoPW.pl line 43, <DATA> + line 2. Got a TGA stop codon at position 23, immediately following [] 0:10 >

Adding a \G anchor to the regex:

while ($dna =~ /\G(\w\w\w)*?TGA/g)

fixes the results for both dna strings, because \G means Match only at pos() (e.g. at the end-of-match position of prior m//g) (see “Assertions” in perlre), and initially pos() is set at zero.

<Begin update> choroba is of course correct, anchoring to the start of the string finds only the first match.

But that means that the regex could also be fixed without recourse to \G, by simply anchoring it to the start of the string:

while ($dna =~ /^(\w\w\w)*?TGA/g)

<End update>

Perhaps not Perl documentation’s finest hour. :-)

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re: Understanding a portion of perlretut by Athanasius
in thread Understanding a portion on the Perlretut by BlueStarry

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.