comment on

This question may well be more about programming in general than Perl specifically, but since it's a problem that is presenting itself to me in Perl (and since the Monks have been so helpful in the past) I'm going to ask it here.
I'm writing a script that converts a language spec into XML. The spec is written in Curl which has this type of layout:

{Curl is all about {braces}}
[download]

Since I'm turning this stuff into XML, I'm using the fantastic Text::Balanced module to balance the braces so that the above can be turned into

<Curl> is all about <braces/></Curl>
[download]

OK, so far so good, right? Now I'm having no problems achieveing this conversion. The problem comes when I have something like

{Curl is all about {code {braces}}}
[download]

which should become

<Curl> is all about {braces}</Curl>
[download]

So, as you see, the code tag works in much the same way as HTML. Still cool, right? It gets a little more complicated. The code

{Curl is all about {code {braces}{escape {braces}}}}
[download]

should become

<Curl> is all about {braces}<braces/></Curl>
[download]

OK, that's as complicated as it gets, aside from the fact that there are a few other tags that act the same as "code," but that's no biggie. Anyway, here's the code I implemented to hopefully make this work:

while($next = (extract_bracketed($text, '{}', '[^{}]*' ))[0]) #this is
+ general.
{ 
    $holder = $next;
    if($bext = (extract_bracketed($next, '{}', '(?s).*?(?=\{ctext|\{co
+de|\{example|\{pre)' ))[0]) #this handles "code" and the like. 
    {
    $bolder = $bext;
    while($cext = (extract_bracketed($bext, '{}', '(?s).*?(?=\{escape)
+' ))[0])  #this is for escaped "code" and the like.
    {
        $colder = $cext;
        $cext =~ s/\{([^ \s|\}]*?)\}/<$1\/>/gix;
        $cext =~ s/\{([\w|-]*)(.*)\}/<$1>$2<\/$1>/osi;
        $bext =~ s/$colder/$cext/sgi;
    }
    $bext =~ s/\{pre(.*)\}/\<pre\>$1<\/pre>/gosix;
    $bext =~ s/\{ctext(.*)\}/\<ctext\>$1<\/ctext>/gosix;
    $bext =~ s/\{code(.*)\}/\<code\>$1<\/code>/gosix;
    $bext =~ s/\{example(.*)\}/\<example\>$1<\/example>/gosix;
    $bext =~ s/\}/ebrac/g;
    $bext =~ s/\{/obrac/g;
    $next =~ s/$bolder/$bext/sgi;
    }
$next =~ s/\{([^ \s|\}]*?)\}/<$1\/>/gix;
$next =~ s/\{([\w|-]*)(.*)\}/<$1>$2<\/$1>/osi;
$text =~ s/$holder/$next/sgi;    
}
[download]

So this code (in my mind anyway) slurps up entire blocks of balanced code and then looks within for a "code"ish tag, looks within that for an "escape" tag, makes the conversions, then sort of backs out of the sub-block and does the necessary replacements. Hopefully my code is easier to follow than that last sentence :-)
So here's the thing: This code works 100% on balanced blocks, 100% (I think) on "code" blocks, and about 50% on escape blocks. I've been poking at it for a day trying to make it work, but to no avail. What am I doing wrong? A regex that's mad at me? Is the algorithm flawed? As you see, this is a question that's applicable to programming in general, or at least I suspect it is as long as its not just a regex or something. So anyway, this question is sort of a shot in the dark, but if anyone of the more experienced programmers (that would be all of you) see something worrisome in my code, I'd appreciate a helping hand. Thanks for your time and consideration :-)

In reply to Best way to escape code blocks with Text::Balanced? by tshabet

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.