in reply to Re: Text::Balanced question
in thread Text::Balanced question

Now I just ran into another problem. Some of my strings contain backslashes. This seems to break everything...

If I use:

$string = 'Param1(TYPE,\abc\),Param2(TYPE,\abc\)';
things break :-(

I tried using double quotation marks instead, but of course that makes it interpret the backslashes differently/incorrectly.

Any ideas?

Thanks again.

Emma

Replies are listed 'Best First'.
Re^3: Text::Balanced question
by ammon (Sexton) on Oct 21, 2006 at 03:41 UTC
    You are (semi)hosed, as far as extract_bracketed() is concerned. The problem with backslashes is that they're used to escape things. That lets you represent non-printable characters, such as \n or \t, in a printable manner, as well as allowing one to say things like:

        my $s = "This string \" has an escaped quote";

    If you then print $s, you get:

        This string " has an escaped quote

    If you are trying to match balanced quotes on that string, you need to skip over the escaped quote inside. This is one of the reasons responders to your original post suggested that parsing balanced thingies is difficult to do with regular expressions.

    Looking at the source code in Text::Balanced, there is, indeed, a line that always eats the next character following a backslash:

        next if $$textref =~ m/\G\\./gcs;

    Your sample suggests that the input is using backslashes as some form of quoting operator, rather than an escape character. If that's a true assumption, then you might try normalizing your input to change the backslashes into something else (and then back again after you're done parsing):

    For example:

    $string = 'Param1(TYPE,\abc\),Param2(TYPE,\abc\)'; $string =~ tr{\\}{:};

    Cheers,

      Thanks a million! Got it to work by replacing the backslashes by "::". :-)

      Emma