Hi folks. Been so long I can't even remember my old username/email, but I find myself dusting off my swiss army knife to analyze some data again, and I'm *so close* to good, but stuck.
I'm trying to parse out what is essentially variable data placeholders in a file, delimited by %'s and outside of some xml tags, so you might see somexml....>%avariable%<somexml...
The problem arises when the user puts multiple fields in a given location, such as >%first%%second%<, and once I noticed that issue and adjusted my regex, the best I could get was capturing the second variable and skipping the first. I'm trying *not* to capture the bounding characters, just the text within. Here's a sample of a portion of data that will be parsed:
<span color="#231f20" whatever="%DoNotMatch%" textOverprint="false">%P +N1%</span> <span color="#231f20" textOverprint="false">%DIMMM%%DIMINCH%</span>
I'm attempting to pull PN1, DIMMM, and DIMINCH from this text block. Here's the closest I've gotten:
my @matches = ($data =~ m/>(?:%([^%]+)%)+</g );In this scenario, I'm getting PN1, DIMINCH. It's matching the full >%DIMMM%%DIMINCH%< string, but only capturing the second portion. I'm unable to figure out how to repeat the delimiting characters as well as the match target itself, without capturing the delimiting characters. Any help would be appreciated.
edit: Based on replies, here's some more info. I'm parsing out all of the lines within a file. I can't guarantee line breaks, so you could have the sample with in a single 'line', and I'm currently slurping the file into one string. Additionally, there are other instances of %blah% within the xml, so I can't just match on that string, I do need the bounding >% and %< overall, to avoid matching those pieces.
In reply to Trouble capturing multiple groupings in regex by reverendphil
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |