This question may well be more about programming in general than Perl specifically, but since it's a problem that is presenting itself to me in Perl (and since the Monks have been so helpful in the past) I'm going to ask it here.
I'm writing a script that converts a language spec into XML. The spec is written in Curl which has this type of layout:
{Curl is all about {braces}}
Since I'm turning this stuff into XML, I'm using the fantastic Text::Balanced module to balance the braces so that the above can be turned into
<Curl> is all about <braces/></Curl>
OK, so far so good, right? Now I'm having no problems achieveing this conversion. The problem comes when I have something like
{Curl is all about {code {braces}}}
which should become
<Curl> is all about {braces}</Curl>
So, as you see, the code tag works in much the same way as HTML. Still cool, right? It gets a little more complicated. The code
{Curl is all about {code {braces}{escape {braces}}}}
should become
<Curl> is all about {braces}<braces/></Curl>
OK, that's as complicated as it gets, aside from the fact that there are a few other tags that act the same as "code," but that's no biggie. Anyway, here's the code I implemented to hopefully make this work:
while($next = (extract_bracketed($text, '{}', '[^{}]*' ))[0]) #this is + general. { $holder = $next; if($bext = (extract_bracketed($next, '{}', '(?s).*?(?=\{ctext|\{co +de|\{example|\{pre)' ))[0]) #this handles "code" and the like. { $bolder = $bext; while($cext = (extract_bracketed($bext, '{}', '(?s).*?(?=\{escape) +' ))[0]) #this is for escaped "code" and the like. { $colder = $cext; $cext =~ s/\{([^ \s|\}]*?)\}/<$1\/>/gix; $cext =~ s/\{([\w|-]*)(.*)\}/<$1>$2<\/$1>/osi; $bext =~ s/$colder/$cext/sgi; } $bext =~ s/\{pre(.*)\}/\<pre\>$1<\/pre>/gosix; $bext =~ s/\{ctext(.*)\}/\<ctext\>$1<\/ctext>/gosix; $bext =~ s/\{code(.*)\}/\<code\>$1<\/code>/gosix; $bext =~ s/\{example(.*)\}/\<example\>$1<\/example>/gosix; $bext =~ s/\}/ebrac/g; $bext =~ s/\{/obrac/g; $next =~ s/$bolder/$bext/sgi; } $next =~ s/\{([^ \s|\}]*?)\}/<$1\/>/gix; $next =~ s/\{([\w|-]*)(.*)\}/<$1>$2<\/$1>/osi; $text =~ s/$holder/$next/sgi; }
So this code (in my mind anyway) slurps up entire blocks of balanced code and then looks within for a "code"ish tag, looks within that for an "escape" tag, makes the conversions, then sort of backs out of the sub-block and does the necessary replacements. Hopefully my code is easier to follow than that last sentence :-)
So here's the thing: This code works 100% on balanced blocks, 100% (I think) on "code" blocks, and about 50% on escape blocks. I've been poking at it for a day trying to make it work, but to no avail. What am I doing wrong? A regex that's mad at me? Is the algorithm flawed? As you see, this is a question that's applicable to programming in general, or at least I suspect it is as long as its not just a regex or something. So anyway, this question is sort of a shot in the dark, but if anyone of the more experienced programmers (that would be all of you) see something worrisome in my code, I'd appreciate a helping hand. Thanks for your time and consideration :-)

In reply to Best way to escape code blocks with Text::Balanced? by tshabet

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.