Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

In the code below, I'm trying to match and capture a small letter character that is flanked by exactly 4 capital letters on its left and right respectively. My regex captures the letter "s" on Line 1 (at position 12) and at Line 2, it captures only the letter "c" (position 6) but not "f" (position 11).

#Each line of $text is independent of the other and not joined as a co +ntinuous line. $text = q~ adfRadfaUYBGsQWERaeYETEWoyMSn nbPOIVcRCVVfOOPQbHbnRIIqWweRT ~; $result = ""; while( $text =~ /[a-z]+[A-Z]{4}([a-z]{1})[A-Z]{4}[a-z]+/g) { $result .= $1; } print $result; # prints sc but should print scf

How do I modify my code to match the "f" on the second line that is also flanked by exactly 4 capital letters on its side?

Thanks in anticipation!

Replies are listed 'Best First'.
Re: Regex help
by Your Mother (Archbishop) on Dec 11, 2013 at 07:09 UTC

    A way–

    my $text = <<""; adfRadfaUYBGsQWERaeYETEWoyMSn nbPOIVcRCVVfOOPQbHbnRIIqWweRT for my $match ( $text =~ /(?<=[A-Z]{4})([a-z])(?=[A-Z]{4})/g ) { print $match, $/; }

      That will also extract single lc alphas that are preceded or followed by more than four uc alphas:

      >perl -wMstrict -le "my $text = 'XXXXXaXXXXX'; ;; for my $match ( $text =~ /(?<=[A-Z]{4})([a-z])(?=[A-Z]{4})/g ) { print $match, $/; } " a

      If AnonyMonk wants single lc alphas that are preceded and followed by exactly four uc alphas (and also concatenated into a string), here's one way:

      >perl -wMstrict -le "my $text = qq{XXXXaXXXXbYYYYYcYYYYYdXXXXeXXXXfgXXXX\nXXXXhXXXXiYYYYY}; print qq{[[$text]]}; ;; my $result = join '', $text =~ m{ (?<= (?<! [[:upper:]]) [[:upper:]]{4}) [[:lower:]] (?= [[:upper:]]{4} (?! [[:upper:]])) }xmsg; print qq{'$result'}; " [[XXXXaXXXXbYYYYYcYYYYYdXXXXeXXXXfgXXXX XXXXhXXXXiYYYYY]] 'aeh'

      (If some look-around is good, more is better!)

      Update: Here are the beginnings of a test bed for playing with this and other regexen:

      >perl -wMstrict -le "for my $text (qw( XXXXaXXXX XXXXaXXXXxyXXXXbXXXXxZZZxZZZxYYYYY XXXXxZZZ ZZZxXXXX XXXXxYYYYY YYYYYxXXXX XXXXxyXXXX XXXXxyXXXXxyXXXX YYYYYaYYYYY ZZZaZZZ) ) { my $result = join '', $text =~ m{ (?<= (?<! [[:upper:]]) [[:upper:]]{4}) [[:lower:]] (?= [[:upper:]]{4} (?! [[:upper:]])) }xmsg; print qq{'$text' -> '$result'}; } " 'XXXXaXXXX' -> 'a' 'XXXXaXXXXxyXXXXbXXXXxZZZxZZZxYYYYY' -> 'ab' 'XXXXxZZZ' -> '' 'ZZZxXXXX' -> '' 'XXXXxYYYYY' -> '' 'YYYYYxXXXX' -> '' 'XXXXxyXXXX' -> '' 'XXXXxyXXXXxyXXXX' -> '' 'YYYYYaYYYYY' -> '' 'ZZZaZZZ' -> ''

        Thank you so much! I tried yours and it works like a champ.

        Thanks everyone for helping :)

Re: Regex help
by mendeepak (Scribe) on Dec 11, 2013 at 08:56 UTC