abcdef has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have below code :
my($abc) = "fred<hello>3hello"; $abc =~ /^[^\d]{2,4}<([^>]+)>\d?\1$/; if (defined($1)) { print "$1\n"; } else { print "not found\n"; } }
What is the code doing ?? what function of the regular expression
$abc =~ /^[^\d]{2,4}<([^>]+)>\d?\1$/;
Please advice what it do ??

Replies are listed 'Best First'.
Re: perl code question
by bart (Canon) on Feb 10, 2011 at 12:24 UTC
    my($abc) = "fred<hello>3hello"; $abc =~ /^[^\d]{2,4}<([^>]+)>\d?\1$/;
    That's quite a complex regex for a newbie. I'll explain by making a somewhat simpler version first:
    /^[^\d]{2,4}<([^>]+)>\d?.*$/;
    First it tries to match 2 to 4 characters that aren't digits (/[^\d]{2,4}/), attached to the front of the string (/^/).

    Then it tries to match something between angle brackets; the thing should not contain a closing angle bracket itself (<([^>]+)>). Notice that there are (unescaped) parens in this part, so the regex engine will capture what it matches, and that'll be the word "hello"; it'll be put into the capture variable $1 because this is the first (actually, the only) set of parens in this regex.

    Finally, it tries to match an optional digit (/\d?/); and then something more.

    If you try to run it now, you'll see it captures the same thing, in this case.

    Your original string is a bit more complex in that the final part must match "\1". This is something special, and it's not chr(1) (that would have been a second possible interpretation): it can only match the string that is in $1 earlier in this match ("hello", remember?). Note that /$1/ would not work: the regex engine would plug the current value of the variable $1 into the regex before it starts to try to match anything; that value would not change afterwards.

    Also note that \1 will only match literal strings: this is not a regex. Using a variable in a regex would treat its contents as a regex. Using \1, it is as if quotemeta is applied to the contents of $1 before using it in the regex.

    TL;DR:

    • ^ matches the start of the string
    • [^\d]{2,4} matches "fred"
    • <([^>]+)> matches "<hello>" and puts "hello" into $1
    • \d? matches "3"
    • \1 matches the earlier capture in $1: "hello"
    • $ matches the end of the string.
Re: perl code question
by derby (Abbot) on Feb 10, 2011 at 13:13 UTC
Re: perl code question
by Ratazong (Monsignor) on Feb 10, 2011 at 11:37 UTC

    There are several ways to find out what an regular expression will do. Often the easiest way is to try yourself (as already suggested by cjb).

    However you could also let some software explain it to you. I prefer this online resource.

    HTH, Rata

    P.S.: Have you tried to enter regex explain to the search-field on the top of this page yet?

Re: perl code question
by Anonymous Monk on Feb 10, 2011 at 11:05 UTC
Re: perl code question
by cjb (Friar) on Feb 10, 2011 at 11:15 UTC

    Have you tried TITS? What happened when you ran it?

Re: perl code question
by aantonyselvam (Beadle) on Feb 10, 2011 at 12:18 UTC

    This regular expression explain that.
    /^ => The starting of the variable
    ^\d => The digit should not come
    {2,4} => previous char should appear min = 2 , max = 4 times here previous char is any of the alphabets
    < => the char '<'
    ( => it is grouping
    ^>+ => other then '>' more then one char
    ) => group is close
    > => the char '<'
    \d? => one or more digit
    \1 => matched group is again
    $/ => The End
    Here the matched string variable is $1. it has assigned when it match.
    so it defined then it prints the matched word