embirath has asked for the wisdom of the Perl Monks concerning the following question:

I don't know if the response I sent to my thread "search for matching parentheses" went through or not. But since I now have a new question, I'm posting a new thread here instead.

I decided to check out the module Text::Balanced, since it seems it could be useful to me not just for this problem but in the future.

But... I'm having some problems with it. I'm wondering why the below code does not work. I'm expecting this piece of code to extract "(x,y,z)", and put ", Param2(1,2,3), Param3(a,b,c)" in the remainder, and "Param1" in the prefix. But it doesn't work. It puts everything in the remainder.

use Text::Balanced qw (extract_bracketed); $string = "Param1(x,y,z), Param2(1,2,3), Param3(a,b,c)"; print "Original string: ", $string, "\n"; ($ext,$rem,$pre) =extract_bracketed($string,'()','.*?'); print "extracted: ", $ext, "\n"; print "remainder: ", $rem, "\n"; print "skip pref: ", $pre, "\n";
Any help would be greatly appreciated.

Thanks! Emma

Replies are listed 'Best First'.
Re: Text::Balanced question
by shmem (Chancellor) on Oct 20, 2006 at 23:57 UTC
    try
    ($ext,$rem,$pre) =extract_bracketed($string,'()','[^()]+');

    Seems that /.*?/ also matches parens, now doesn't it?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Now I just ran into another problem. Some of my strings contain backslashes. This seems to break everything...

      If I use:

      $string = 'Param1(TYPE,\abc\),Param2(TYPE,\abc\)';
      things break :-(

      I tried using double quotation marks instead, but of course that makes it interpret the backslashes differently/incorrectly.

      Any ideas?

      Thanks again.

      Emma
        You are (semi)hosed, as far as extract_bracketed() is concerned. The problem with backslashes is that they're used to escape things. That lets you represent non-printable characters, such as \n or \t, in a printable manner, as well as allowing one to say things like:

            my $s = "This string \" has an escaped quote";

        If you then print $s, you get:

            This string " has an escaped quote

        If you are trying to match balanced quotes on that string, you need to skip over the escaped quote inside. This is one of the reasons responders to your original post suggested that parsing balanced thingies is difficult to do with regular expressions.

        Looking at the source code in Text::Balanced, there is, indeed, a line that always eats the next character following a backslash:

            next if $$textref =~ m/\G\\./gcs;

        Your sample suggests that the input is using backslashes as some form of quoting operator, rather than an escape character. If that's a true assumption, then you might try normalizing your input to change the backslashes into something else (and then back again after you're done parsing):

        For example:

        $string = 'Param1(TYPE,\abc\),Param2(TYPE,\abc\)'; $string =~ tr{\\}{:};

        Cheers,

      Aha. Wonderful. That works. Thanks!!!!!

      Though I would have thought '.*?' would try to match a pattern as short as possible (because of the "?") and thus not include the parentheses? Anyway, I guess I still don't fully understand the regex stuff.. but I'm glad my code now works. :-)

      Thanks again! Emma