comment on

Hi folks. Been so long I can't even remember my old username/email, but I find myself dusting off my swiss army knife to analyze some data again, and I'm *so close* to good, but stuck.

I'm trying to parse out what is essentially variable data placeholders in a file, delimited by %'s and outside of some xml tags, so you might see somexml....>%avariable%<somexml...

The problem arises when the user puts multiple fields in a given location, such as >%first%%second%<, and once I noticed that issue and adjusted my regex, the best I could get was capturing the second variable and skipping the first. I'm trying *not* to capture the bounding characters, just the text within. Here's a sample of a portion of data that will be parsed:

<span color="#231f20" whatever="%DoNotMatch%" textOverprint="false">%P
+N1%</span>
<span color="#231f20" textOverprint="false">%DIMMM%%DIMINCH%</span>
[download]

I'm attempting to pull PN1, DIMMM, and DIMINCH from this text block. Here's the closest I've gotten:

my @matches = ($data =~ m/>(?:%([^%]+)%)+</g );

In this scenario, I'm getting PN1, DIMINCH. It's matching the full >%DIMMM%%DIMINCH%< string, but only capturing the second portion. I'm unable to figure out how to repeat the delimiting characters as well as the match target itself, without capturing the delimiting characters. Any help would be appreciated.

edit: Based on replies, here's some more info. I'm parsing out all of the lines within a file. I can't guarantee line breaks, so you could have the sample with in a single 'line', and I'm currently slurping the file into one string. Additionally, there are other instances of %blah% within the xml, so I can't just match on that string, I do need the bounding >% and %< overall, to avoid matching those pieces.

In reply to Trouble capturing multiple groupings in regex by reverendphil

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.