Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks.

This is not a homework assignment. I am writing this purely for self-interest. I am attempting to find and extract a specific regex within a larger string and store each match in an array.
use strict; use warnings; my $dna = 'xxxxxxpecbcbccrlxxxxxxpeeeerlxxxxxplxxxxxPeRLxxxx'; my @matches; if ($dna =~ /(p.*?l)/gi) { push @matches, $1; } for (@matches) { print $_, "\n"; }
Unfortunately, only the first match is located and stored in the array. I am also considering capturing the beginning and ending index of each match (reading left to right). Any suggestions would be appreciated.

Thanks.

Replies are listed 'Best First'.
Re: Find and extract substring(s) within larger string.
by Athanasius (Archbishop) on Oct 21, 2013 at 03:47 UTC
      Thanks for the help. I'm also trying to use the qr function to speed up the regex:
      my $regex = qr/p[^x]+l/; my @matches = $dna =~ /($regex)/;
      ...but I don't know where to place the switches. For example, $regex\migs

      Thnnks.
        ... speed up the regex:
        my $regex = qr/p[^x]+l/;

        Please be aware that use of the  /i case-insensitivity regex modifier usually imposes a speed penalty, perhaps quite significant if you're really dealing with long-ish (e.g., DNA) strings. Using a character class avoids this:
            my $regex = qr/[Pp][^x]*[Ll]/;
        with no  /i modifier needed anywhere. As always, Benchmark-ing tells the true tale with regard to performance in a real application; anything else, however well informed, is speculation.

        Also be aware that the  [^x]+ term in the quoted regex requires at least one non-'x' to be present for a match, thus excluding a match on something like 'pl'. So the final code might look like the following code. (Note that  () capturing parentheses are not needed in this case and may impose a speed penalty.)

        >perl -wMstrict -le "my $dna = 'xxpecbcbccrlxxxPeeeerlxxpLxxxPeRLxx'; ;; my $perl = qr{ [Pp] [^x]* [Ll] }xms; ;; my @matches = $dna =~ m{ $perl }xmsg; printf qq{'$_' } for @matches; " 'pecbcbccrl' 'Peeeerl' 'pL' 'PeRL'
Re: Find and extract substring(s) within larger string.
by 2teez (Vicar) on Oct 21, 2013 at 06:33 UTC

    Hi Anonymous Monk,
    I am also considering capturing the beginning and ending index of each match (reading left to right)
    You can also do like so, using while loop and index function:

    use warnings; use strict; my $dna = 'xxxxxxpecbcbccrlxxxxxxpeeeerlxxxxxplxxxxxPeRLxxxx'; my $re = qr/p.*?l/i; while ( $dna =~ m[($re)]g ) { my $beg = index( $dna, $1 ); my $len = length($1); print join " " => $1, $beg, $beg + ($len -1), $/; # updated }
    which produces..
    pecbcbccrl 6 15 peeeerl 22 28 pl 34 35 PeRL 41 44
    #Update: Correction of length of each string longer by one, as rightly pointed out by hdb

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      Perl provides the $- and $+ @- and @+ arrays for this purpose (thanks to kcott pointing out that arrays have the @ sigil...):

      use warnings; use strict; # 1 2 3 4 # 0123456789012345678901234567890123456789012345678 my $dna = 'xxxxxxpecbcbccrlxxxxxxpeeeerlxxxxxplxxxxxPeRLxxxx'; my $re = qr/p.*?l/i; while ( $dna =~ m[($re)]g ) { print join " " => $1, $-[0], $+[0]-1, $/; }

      A correction of -1 is required, otherwise the matches are too long by one as in your code above.

        Hi hdb,
        A correction of -1 is required, otherwise the matches are too long by one as in your code above.
        You are right... my bad!

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me