in reply to help with regex

Hello rnaeye,

To capture 10 or more consecutive G characters, you don’t need (G)\1{9,}, just use (G{10,}), which is simpler and easier to read.

Note than when you have a regex of the form / .* (G{10,}) /x, the first .* is greedy and will match as much of the G-sequence as it can, so the second capture will contain only the 10 Gs it needs to satisfy the match. If you want all the Gs (15 for the sample data given), you need to make the first match non-greedy: / .*? G{10,} /x.

Your requirements are not clear (to me). Please provide the exact output you desire for the given input data (and additional lines of input together with the desired output for each). In the meantime, I’m guessing you want to find a 10-character ACTG sequence immediately following the specific sequence ACTCCAGTCACGCCAATATCTCGTAT and followed (but not necessarily immediately) by a 10+ sequence of G characters:

use 5.18.2; while (my $line = <DATA>) { say; if ($line =~ m/ (ACTCCAGTCACGCCAATATCTCGTAT) ([ACTG]{10}) .*? (G{1 +0,}) /x) { say for $1, $2, $3; # Can use @{^CAPTURE} in Perl 5.25.7 an +d later } } __DATA__ GGCTTTCCGTTGTTGCTGGGTGTGGGGGGCGGGCGAGATTGGAAGAGCACACGTCTGAACTCCAGTCACG +CCAATATCTCGTATGCCGTCTTCTGCTTGAAAAAAGGGGTGGGGGGGAGGGGGGGCGGGGGGGGGGGGG +GGAGGGGGGGAG

Output:

13:18 >perl 1986_SoPW.pl ACTCCAGTCACGCCAATATCTCGTAT GCCGTCTTCT GGGGGGGGGGGGGGG 13:18 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: help with regex
by jwkrahn (Abbot) on Mar 20, 2019 at 04:20 UTC
    If you want all the Gs (15 for the sample data given), you need to make the first match non-greedy: / .*? G{10,} /x.

    Or just / G{10,} /x would be simpler and do the same thing. (Unless the string contains newlines!)