luked has asked for the wisdom of the Perl Monks concerning the following question:

I have a standardized file path that I need to retrieve the second occurrence of and everything after the regex match.
For example:
my $regex = "\\default\\main\\TSDEMO\\WORKAREA\\tsdemo_intranet\\TSDEMO\\images\\corner";
my $website = "TSDEMO"

my ($parent) = $regex =~ m/$website/gi;
$parent = $';

In this case:
$parent = "\images\corner"

I would like for my regex to yield:
$parent = "\TSDEMO\images\corner"

I know I could concatante the $website variable with the result but am looking for a smarter way to handle this. What am I missing in my existing regex in order to capture this? Any other gotchas I should be aware of?
Thanks in advance!
  • Comment on Retrieve second occurrence and everything afterwards using regex

Replies are listed 'Best First'.
Re: Retrieve second occurrence and everything afterwards using regex
by TedPride (Priest) on Jan 04, 2005 at 19:55 UTC
    Forget complicated regex. For this you can use rindex / substr:
    use strict; use warnings; my $regex = '\default\main\TSDEMO\WORKAREA\tsdemo_intranet\TSDEMO\imag +es\corner'; my $website = 'TSDEMO'; my $parent = substr($regex, rindex($regex, $website)); print $parent;
    (note that this gives TSDEMO\images\corner with no leading \, since you don't have $website as \TSDEMO)

    Or if you really MUST have case insensitive matching - which doesn't make sense because file paths are sensitive - you can make uppercase copies of the original string and use those:

    use strict; use warnings; my $regex = '\default\main\TSDEMO\WORKAREA\tsdemo_intranet\TSDEMO\imag +es\corner'; my $website = 'TSDEMO'; my $tregex = uc($regex); my $parent = substr($regex, rindex(uc($regex), uc($website))); print $parent;
    BTW, what you're asking for is the third match, not the second. With case insensitive matching, tsdemo from tsdemo_intranet also matches.
Re: Retrieve second occurrence and everything afterwards using regex
by si_lence (Deacon) on Jan 04, 2005 at 18:13 UTC
    Hi,
    You could use a positive lookahead like this:
    use strict; use warnings; my $regex = "\\default\\main\\TSDEMO\\WORKAREA\\tsdemo_intranet\\TSDE +MO\\images\\corner"; my $website = "TSDEMO"; $regex =~ m/($website.*)(?=\\$website)/gi; my $parent = $'; print "$parent\n";
    But what is wrong with concatenating the $website variable to the result?
    BTW your (and my) code does not find the second but the last occurance of $website.
    si_lence
Re: Retrieve second occurrence and everything afterwards using regex
by duff (Parson) on Jan 04, 2005 at 18:29 UTC

    To retrieve everything after the Nth occurence, you could do something like this:

    #!/usr/bin/perl use strict; use warnings; my $regex = "\\default\\main\\TSDEMO\\WORKAREA\\tsdemo_intranet\\TSDEM +O\\images\\corner"; my $n = 2; my $website = "TSDEMO"; 1 while $regex =~ m/(?=$website)/g && --$n; my $parent = $'; print "$parent\n";
Re: Retrieve second occurrence and everything afterwards using regex
by perlsen (Chaplain) on Jan 05, 2005 at 05:09 UTC

    if u wish you can change your regex as follows

    use strict; use warnings; my $regex = "\\default\\main\\TSDEMO\\WORKAREA\\tsdemo_intranet\\TSDEM +O\\images\\corner"; my $website = "TSDEMO"; $regex =~ /(.*)($website)(.*?)(\2.*)$/g; my $parent = $4; print "$parent\n";
    output:

    TSDEMO\images\corner

    regards,

    Senthil Kumar.k

Re: Retrieve second occurrence and everything afterwards using regex
by johnnywang (Priest) on Jan 04, 2005 at 18:43 UTC
    accoring to the camel book, $' and $` has performance penalties. Why not simply (not tested):
    m|$website.*/($website.*)$|
Re: Retrieve second occurrence and everything afterwards using regex
by EdwardG (Vicar) on Jan 05, 2005 at 14:47 UTC
    my $parent = reverse ((reverse $a) =~ /^.+?\\OMEDST\\/g);