I am trying to catch matches in a DNA sequence. I can capture repeating G (10 or more) in DNA as below.
use 5.18.2; my $line; while (<DATA>){ $line = $_; if ($line =~ m/(G)\1{9,}/) { say "$&" } } __DATA__ GGCTTTCCGTTGTTGCTGGGTGTGGGGGGCGGGCGAGATTGGAAGAGCACACGTCTGAACTCCAGTCACG +CCAATATCTCGTATGCCGTCTTCTGCTTGAAAAAAGGGGTGGGGGGGAGGGGGGGCGGGGGGGGGGGGG +GGAGGGGGGGAG
What I want to capture more is below. In addition to 10 G, I also want to capture strings at the left of (G)\1{9,}. Note I use " " to indicate what I want to capture; it's not a part of the DNA string. I could not capture the other parts of the string in conjunction with (G)\1{9,}. I need to print what I capture.
"ACTCCAGTCACGCCAATATCTCGTAT" "[ACTG]{0,10}" " .+" "(G)\1{9,} ".+ "Thanks.
In reply to help with regex by rnaeye
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |