rnaeye has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to catch matches in a DNA sequence. I can capture repeating G (10 or more) in DNA as below.
use 5.18.2; my $line; while (<DATA>){ $line = $_; if ($line =~ m/(G)\1{9,}/) { say "$&" } } __DATA__ GGCTTTCCGTTGTTGCTGGGTGTGGGGGGCGGGCGAGATTGGAAGAGCACACGTCTGAACTCCAGTCACG +CCAATATCTCGTATGCCGTCTTCTGCTTGAAAAAAGGGGTGGGGGGGAGGGGGGGCGGGGGGGGGGGGG +GGAGGGGGGGAG
What I want to capture more is below. In addition to 10 G, I also want to capture strings at the left of (G)\1{9,}. Note I use " " to indicate what I want to capture; it's not a part of the DNA string. I could not capture the other parts of the string in conjunction with (G)\1{9,}. I need to print what I capture.
"ACTCCAGTCACGCCAATATCTCGTAT" "[ACTG]{0,10}" " .+" "(G)\1{9,} ".+ "Thanks.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: help with regex
by Athanasius (Archbishop) on Mar 20, 2019 at 03:20 UTC | |
by jwkrahn (Abbot) on Mar 20, 2019 at 04:20 UTC | |
|
Re: help with regex
by hdb (Monsignor) on Mar 20, 2019 at 08:16 UTC | |
|
Re: help with regex
by Marshall (Canon) on Mar 20, 2019 at 05:26 UTC | |
|
Re: help with regex
by Marshall (Canon) on Mar 20, 2019 at 05:23 UTC | |
|
Re: help with regex
by rnaeye (Friar) on Mar 21, 2019 at 00:49 UTC |