comment on

greetings, perl people,

I'm writing a parser for generic lightweight markup, that is agnostic to the different flavours (Textile, reST, Wikimedia, Creole etc.). Therefore it heavily relies on plugins. The Block Parser works fine by now, but the Inline Parser (for typography, links, footnotes etc.) is giving me headaches, because of Text::Balance's animosity towards regexp-metacharacters.

How can I match a string delimited by double asterisk ('**') with Text::Balanced?

Suppose I have this string

$str = '**bold words**'
[download]

and I want to use extract_tagged to extract the bold words part.

My first try:

warn Dumper extract_tagged( 
    $str,
    '**',
    '**',
);
[download]

Now perl complains about nested qualifiers because Text::Balanced creates a regex that starts with /\G**. So I try escaping the '*'s:

warn Dumper extract_tagged( 
    $str,
    '\*\*',
    '\*\*',
);
[download]

That leads to a 'quantifier follows nothing' regexp error. Okay, double escape it, methinks, but that gives the same error, 'quantifier follows nothing' in the regexp Text::Balance creates.

When I try it with three or more '\'s, there are neither errors nor results and I am thoroughly confused because of all the escaping of escaping chars, I played around with quotemeta, but that didn't work either so I turn to you:

Is there a way to match a string delimited by multiple regexp metacharaters like '*' and if so, what would be the least confusing way to implement it?

Thanks in advance and merry kwanzaa
kba

In reply to Extracting multiple-asterisk-delimited substring with Text::Balanced by kba

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.