PerlKc has asked for the wisdom of the Perl Monks concerning the following question:

Please help. My problem is I need to find the matched position. ExpectedOutcome is

Looking for pattern1 = C[AG]G Pattern C[AG]G matched CAG at reside 2 ans so on......

I have generated following code but couldn't come up with desired output. Please help

#!/usr/bin/perl use warnings; use strict; use diagnostics; my $Dna1 = "AACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTGA +AAACAATCACGA\n"; while ($Dna1 =~ /C[AG]G/g){ my $endposition = pos($Dna1) + 1; my $matchlength = length($1); #I need to print “Looking for pattern1 = C[AG]G Pattern C[AG]G matched CAG at residue 2” print “……….”

Replies are listed 'Best First'.
Re: Regex matching and position
by graff (Chancellor) on Oct 21, 2015 at 01:42 UTC
    First, don't use non-ASCII characters in your code unless you absolutely must have them there (e.g. as a value to be assigned to a variable). As pointed out above, using non-ASCII "smart quotes" to delimit literal strings in your code is an error.

    Next, if you want to use $1 ($2, etc) after doing a regex match, you have to put parens into the regex to capture some part(s) of what is being matched.

    Apart from those two points, there's not much that needs to be added to your code:

    #!/usr/bin/perl use warnings; use strict; use diagnostics; my $Dna1 = "AACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAGTGA +AAACAATCACGA\n"; while ($Dna1 =~ /(C[AG]G)/g) { my $endposition = pos($Dna1) + 1; print "Pattern C[AG]G matched $1 ending at $endposition\n"; }
    When I run that, I get:
    Pattern C[AG]G matched CAG ending at 6 Pattern C[AG]G matched CGG ending at 11 Pattern C[AG]G matched CAG ending at 38 Pattern C[AG]G matched CGG ending at 48
    Those offset values (6, 11, 38, 48) represent the position of the next character after the 3-letter match (where the first character of the string is at position 1). That is, "6" points to the "C" that follows the first "CAG", "11" points to the "C" that follows the next "CGG", and so on.

    (updated to fix a typo)

Re: Regex matching and position
by AnomalousMonk (Archbishop) on Oct 21, 2015 at 02:09 UTC

    A slightly different approach:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $Dna1 = qq{AACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGAACGATAG +TGAAAACAATCACGA\n}; ;; while ($Dna1 =~ /(C[AG]G)/g) { my $sub_seq = $1; my ($start, $end, $len) = ($-[1], $+[1], $+[1]-$-[1]); print qq{matched '$sub_seq' @ offsets $start thru $end, $len long}; } " matched 'CAG' @ offsets 2 thru 5, 3 long matched 'CGG' @ offsets 7 thru 10, 3 long matched 'CAG' @ offsets 34 thru 37, 3 long matched 'CGG' @ offsets 44 thru 47, 3 long
    Note that once again, the ending offset is for the first character after the matched sub-sequence. Note also that this regex matches non-overlapping sub-sequences only. (A slightly different regex can match overlaps if you need this.) See the perlvar subsection "Variables related to regular expressions" for  @- @+ special regex array info.

    Update: Reading the OP more closely, I see you want to capture and print the matched sub-sequence also. I have posted new code to do this using a capture group. Sorry for any confusion. (Update: Buried the evidence a bit deeper under some  <readmore> tags.)


    Give a man a fish:  <%-{-{-{-<

      you save my day. maybe this is not the right place, but I just wanted to thank all perlmonks. every time I had a hard problem with perl, I easily found a solution in perlmonks.org so thank you all.

        You're very welcome. We exist to serve. (You might also think about stopping by the Offering Plate.)


        Give a man a fish:  <%-{-{-{-<

Re: Regex matching and position
by Anonymous Monk on Oct 20, 2015 at 23:50 UTC
    ?Whats with the smart quotes “”?

      I need a print statement code, Please Help!! smart quote in the code just shows desired print statement which i mentioned in the beginning of the code

        I need a print statement code, Please Help!! smart quote in the code just shows desired print statement which i mentioned in the beginning of the code

        :) I'm helping, smart quotes are a syntax error