manishrathi has asked for the wisdom of the Perl Monks concerning the following question:

$v_path =~ m/((.*)EDITION\/|Edition\/)(.*)/;

1) What does this syntax mean ? My understanding is match Edition OR Edition preceded or followed by anything.

$v_path=//rsoesn/default/main/dis_sites/dis_releases/EDelivery/prod/EDITION/scc_audit01092014

Will this regex mean entire path "rsoesn/default/main/dis_sites/dis_releases/EDelivery/prod/EDITION/scc_audit01092014" as rsoesn/default/main/dis_sites/dis_releases/EDelivery/prod is preceding EDITION and scc_audit01092014 is following EDITION ?

2) What does \/| in the middle mean in this syntax ? Do we not just use | to specify to match any of the two values ?

3) In this syntax $1 = ((.*)EDITION\/|Edition\/) and $2 = (.*), so what will be $3 ?

Thanks

Replies are listed 'Best First'.
Re: Understand Regex syntax
by toolic (Bishop) on Jan 24, 2014 at 18:49 UTC

    Tip #9 from the Basic debugging checklist: YAPE::Regex::Explain

    The regular expression: (?-imsx:((.*)EDITION/|Edition/)(.*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- EDITION/ 'EDITION/' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- Edition/ 'Edition/' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Understand Regex syntax
by AnomalousMonk (Archbishop) on Jan 24, 2014 at 20:07 UTC
    3) In this syntax $1 = ((.*)EDITION\/|Edition\/) and $2 = (.*), so what will be $3 ?

    A numbered capture group is numbered according to the strict order in which its opening parenthesis appears in the regex:
       $1 --+-----------------------+
            |                       |
            V                       V
           /((.*)EDITION\/|Edition\/)(.*)/
             ^  ^                    ^  ^
             |  |                    |  |
        $2 --+--+               $3 --+--+
    See perlre, and also Grouping things and hierarchical matching and Extracting matches in perlretut.

Re: Understand Regex syntax
by davido (Cardinal) on Jan 24, 2014 at 19:32 UTC

    Alternation is constrained only by grouping parens, or by the pattern as a whole. In your case, you've got two alternates of choice:

    (.*)EDITION/

    ...or...

    Edition/

    Whichever of those alternates matches will be captured into $1. $2 will only be populated if the first alternate is matched, however. $3 will always be populated with something (assuming the overall pattern match is successful), even if that something is an empty string; it takes place outside of the alternation.

    The \/ construct equates to the literal forward slash. It has to be escaped with a backslash so that it doesn't get confused with the m// operator's terminating /.

    As for your last question: $1 will get everything matched by (.*)EDITION\/|Edition\/. $2 will get everything matched by the first (.*), but only if the first alternate in your alternation is matched. $3 will get everything matched by the final (.*).

    What are you actually wanting to match? Is it true that you only want $2 to be populated with a useful value if the text contains EDITION/, but not if it contains Edition/?


    Dave

Re: Understand Regex syntax
by AnomalousMonk (Archbishop) on Jan 24, 2014 at 19:46 UTC
    Will this regex mean entire path "..." ... ?

    Why not just do a few experiments and find out?

    >perl -wMstrict -le "my $v_path = '//rsoesn/main/dis_sites/EDelivery/EDITION/scc_audit0109 +2014'; print qq{ '$v_path'}; ;; $v_path =~ m/((.*)EDITION\/|Edition\/)(.*)/; print qq{\$1 '$1'}; print qq{\$2 '$2'}; print qq{\$3 '$3'}; " '//rsoesn/main/dis_sites/EDelivery/EDITION/scc_audit01092014' $1 '//rsoesn/main/dis_sites/EDelivery/EDITION/' $2 '//rsoesn/main/dis_sites/EDelivery/' $3 'scc_audit01092014'