comment on

Hi All, So I'm using the excellent CPAN module Text::Balance, which is the coolest and most useful module ever. Anyway, I'm using it to parse out balanced blocks of code, such that

{foo is awesome and so is {bar}}
[download]

is a match (since the braces are balanced) but

{foo is not {balanced}
[download]

will not. Pretty elementary, yes? OK, so here's my problem. I'm using this module in order to convert some code written in Curl into XML. Curl uses || to denote comments, like this:

{center foo}
||this is comment
[download]

my problem is that the comment often contains braces, and I of course don't want to end up with a situation where

|| an open brace looks like this {
{beginning of a block
{some function}
end of a block}
[download]

does not match. This results in everything after an extra { in comment being unmatched to the end of the input, since a matching } won't be found unless it's a random } in another comment. So my current (not so) brilliant thinking is that I will use a regex like this:

$text =~ s/\|\|(.*?)\n/\<\!\-\-$1\-\-\>\n/g; #turn line comment form i
+nto XML comment form
$text =~ s/<\!\-\-(.*?)\}(.*?)\-\->/<\!\-\-$1 endbrace $2\-\->/gxs; #e
+scape the end braces in comment so they needn't be balanced
$text =~ s/<\!\-\-(.*?)\{(.*?)\-\->/<\!\-\-$1 openbrace $2\-\->/gxs; #
+ditto for open braces
[download]

So far so good, right? OK, so I run these regexes, then run the Text::Balanced routine, then after that is done I search for "endbrace" and "openbrace" and replace them appropriately. So this makes sense in my head, but it does not seem to work in actuality. I was hoping that, for example,


||{paragraph The union of zero or more  {glossary citation="type", typ
+es }
|| may be denoted using  {ONE-OF }.  For example, the
|| {glossary type expression }  {ctext  {one-of int float } }  {glossa
+ry
|| citation="evaluate", evaluates } to a non- {glossary
|| representational type } that can be used to  {glossary declare } a 
+ {glossary
|| variable } that can hold either an  {INT } or a  {FLOAT }. }
[download]

Would simply become the same block of code with  at the end of each line. Actually, the change of comment indicators seems fine, but identifying each brace seems to be screwing up. Running the above code through my program I get


<!-- { paragraph The union of zero or more  <glossary citation="type",
+ types  } -->
<!-- may be denoted using   { ONE-OF  } .  For example, the-->
<!--  { glossary type expression  }   <ctext>  <one-of> int float </on
+e-of> </ctext>  <glossary-->>
<!-- citation="evaluate", evaluates  }  to a non-  { glossary-->
<!-- representational type  }  that can be used to   { glossary declar
+e </glossary--> a  <glossary-->>
<!-- variable  }  that can hold either an   { INT </glossary--> or a  
+<FLOAT> </FLOAT>. > -->
[download]

The <> instead of {} is due to the Text::Balanced recognizing balanced braces, which indicates that my regexes are not working in all occurences. Specifically, they seem to match once per line of comment. Am I making some dumb mistake in my regexes? Is there a better way to get Text::Balanced to ignore braces in ||comment? Any suggestions/pointers much appreciated. Thanks!

In reply to Problem with skipping comment by tshabet

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.