in reply to Re: Find and extract substring(s) within larger string.
in thread Find and extract substring(s) within larger string.

Thanks for the help. I'm also trying to use the qr function to speed up the regex:
my $regex = qr/p[^x]+l/; my @matches = $dna =~ /($regex)/;
...but I don't know where to place the switches. For example, $regex\migs

Thnnks.

Replies are listed 'Best First'.
Re^3: Find and extract substring(s) within larger string.
by Athanasius (Archbishop) on Oct 21, 2013 at 04:41 UTC
Re^3: Find and extract substring(s) within larger string.
by AnomalousMonk (Archbishop) on Oct 21, 2013 at 23:37 UTC
    ... speed up the regex:
    my $regex = qr/p[^x]+l/;

    Please be aware that use of the  /i case-insensitivity regex modifier usually imposes a speed penalty, perhaps quite significant if you're really dealing with long-ish (e.g., DNA) strings. Using a character class avoids this:
        my $regex = qr/[Pp][^x]*[Ll]/;
    with no  /i modifier needed anywhere. As always, Benchmark-ing tells the true tale with regard to performance in a real application; anything else, however well informed, is speculation.

    Also be aware that the  [^x]+ term in the quoted regex requires at least one non-'x' to be present for a match, thus excluding a match on something like 'pl'. So the final code might look like the following code. (Note that  () capturing parentheses are not needed in this case and may impose a speed penalty.)

    >perl -wMstrict -le "my $dna = 'xxpecbcbccrlxxxPeeeerlxxpLxxxPeRLxx'; ;; my $perl = qr{ [Pp] [^x]* [Ll] }xms; ;; my @matches = $dna =~ m{ $perl }xmsg; printf qq{'$_' } for @matches; " 'pecbcbccrl' 'Peeeerl' 'pL' 'PeRL'