A handy rule to know about Perl's regular expressions is that a regular expression will always match the leftmost longest match, and not the overall longest match. If you want a good treatise on how regular expressions work, I recommend the "Owl Book". Personally, I found the 1st edition quite good for Perl. The second edition adds Unicode and the third edition didn't add much beyond that, so in my opinion, if you find an older edition, it'll help you just as well.
| [reply] |
You aren't putting any bounds on what it throws away when looking for your tag. Here is a more simple example. Here's the data:
a=1,b=2,a=2,a=3,b=4
If you search using /b(.+?)4/, you will match b=2,a=2,a=3,b=4 because the .+? expression doesn't care about what it discards to find your match.
In your code above, you match the first ELEMENT tag, but your .+? expressions only care about the part number; that is why they eat multiple ELEMENT tags on their way to it.
while you could do your match using very convoluted and fragile regexes, it is better to do it programatically. Since you only want to match within 1 record of your data, the first step should be splitting the input into records (demonstrated in another responce). then your regexes will be limited to just the 1 record you are searching
-pete
"Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
| [reply] [d/l] [select] |
I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern
No, you are correct. It finds the minimum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.
Without the non-greedy modifier, it finds the maximum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.
| [reply] |
I was under the incorrect impression that if I put in that question mark, then it would shrink the match down to just the minimum chunk of data that matches the pattern
No, you are correct. It finds the minimum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.
Without the non-greedy modifier, it finds the maximum chunk of data that matches /.+/ and is followed by something that matches the rest of the pattern, and still allow the whole pattern to match.
| [reply] |