anandvn has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Here is my Input string
"TestVar(Test1->(xy))->Var2(Test2(10)(12))->Finalvar"

Expected Output "TestVar->Var2->FinalVar"

Could any one please help me in finding out the Regex to obtain the expected output?

I tried s/\(.+\)//g; but its not working. Otherwise Please correct where i was wrong.

Thanks for your help in advance,

Anand V
  • Comment on Regex required to Replace the array specifiers in a string

Replies are listed 'Best First'.
Re: Regex required to Replace the array specifiers in a string
by johngg (Canon) on Jul 02, 2010 at 13:32 UTC

    I wouldn't try to do this using regular expressions but would use a state engine instead.

    use strict; use warnings; use 5.010; my $string = q{TestVar(Test1->(xy))->Var2(Test2(10)(12))->Finalvar}; say $string; my $parenDepth = 0; my $newString = q{}; foreach my $char ( split m{}, $string ) { if ( $char eq q{(} ) { $parenDepth ++; } elsif ( $char eq q{)} ) { $parenDepth --; } else { $newString .= $char unless $parenDepth; } } say $newString;

    The output

    TestVar(Test1->(xy))->Var2(Test2(10)(12))->Finalvar TestVar->Var2->Finalvar

    I hope this is helpful.

    Cheers,

    JohnGG

      Hi John,

      Thanks a lot for the prompt response. This logic is working fine. But i face perfomance issue since my input sample is huge (app 1 lakh strings). Hence i would be grateful if this can be solved using regex or any other efficient way.

      Cheers,
      Anand

        If you think you are having performance issues, profile your code to check where your bottlenecks are. Devel::NYTProf is an example of a good utility in this area. We cannot help you improve performance unless you provide specific pieces of code that you have experimentally demonstrated are your bottlenecks.
Re: Regex required to Replace the array specifiers in a string
by kennethk (Abbot) on Jul 02, 2010 at 13:34 UTC
    Doing nested parenthesis (or any paired delimiter) in regular expressions is not trivial. In your case, it is a little simpler since you are trying to strip out all parentheticals. You can accomplish your task by repeatedly running a substitution that eliminates inner-most pairs, like:

    $_ = "TestVar(Test1->(xy))->Var2(Test2(10)(12))->Finalvar"; 1 while (s/\([^()]*\)//g); print;

    I repeatedly substitute matched sets of parentheses to blank strings, where I ensure the parentheses do not contain other parentheses Using character classes. See perlretut for more info.

      Hi Utilitarian & Ken,

      Thanks a lot for your replies. My issue got resolved with your logic.

      Cheers
      Anand

Re: Regex required to Replace the array specifiers in a string
by Utilitarian (Vicar) on Jul 02, 2010 at 13:34 UTC
    You need to be more selective about what you are matching, and will have to repeat the substitution until all nested matches are removed
    use strict; use warnings; my $string="TestVar(Test1->(xy))->Var2(Test2(10)(12))->Finalvar"; while ($string=~s/ # repeat while we make matches \( #beginning of a parenthesised group [^)(]+ # non parentheses charaters \) # end of group //gx) {} print "$string\n";
    This will remove parenthesised groups, starting from the innermost

    Edit: tidied up comments in code block

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
Re: Regex required to Replace the array specifiers in a string
by AnomalousMonk (Archbishop) on Jul 02, 2010 at 16:26 UTC

    The state machine, recursive processing of parenthetic expressions, or similar suggestions of others are more general and, IMO, likely to be more robust solutions. One cannot speak to the issue of speed without more info.

    However, I notice in the example you give a characteristic that might be exploited to advantage. Each highest-level parenthetic expression, whatever the nesting within it, is preceded by an alphanumeric ( \w) character and followed by a  '->' sequence allowing the following, perhaps rather fragile, approach:

    >perl -wMstrict -le "my $s = 'TestVar(Test1->(xy))->Var2(Test2(10)(12))->Finalvar'; $s =~ s{ (?<= \w) \( .+? \) (?= ->) }{}xmsg; print qq{'$s'}; " 'TestVar->Var2->Finalvar'