How to avoid the greedy matching failures?.

perlsen has asked for the wisdom of the Perl Monks concerning the following question:

Can someone explain how to match the nested element of same tag?.
I have faced problem, in the inner level data matching thru greedy matching
It results failure.
For example below I need to match the "<ta>...</ta>" tags in the following order
that is in the <root> tag the inner most <ta> tag elements should be processed first and
then the outer <ta> should match next then....

outputs:

<ta>text</ta><br>
<ta>the sample <ta>text</ta> explanation</ta><br>
<ta>this is <ta>the sample <ta>text</ta> explanation</ta> needded</ta>
+<br>
<p>for the<br>
<p>inputs:<br>
"<root><ta>this is <ta>the sample <ta>text</ta> explanation</ta> needd
+ed</ta></root>"
[download]

Thanks in advance and suggestions are more welcome.

20050206: Unconsidered by Corion, was considered by jbrugger: change the tite to something like: match the contents of nested alike (same) tags; Keep/Edit/Delete: 9/15/0

Comment on How to avoid the greedy matching failures?. Download Code

Replies are listed 'Best First'.
Re: How to avoid the greedy matching failures?. by murugu (Curate) on Feb 03, 2005 at 05:13 UTC
Hi Perlsen, Regexp::Common::balanced helps you in this regard.	[reply]
Re: How to avoid the greedy matching failures?. by jbrugger (Parson) on Feb 03, 2005 at 05:03 UTC
Mind the question mark in the regext to make it non-greedy: However, i wonder why you didn't search, this question is answered before. (Re^3: Pattern matching, How will my regular expression match?, Re: Regexp help, multiple lines ) Woops! Now it's my fault, i did not read properly, i'll be back with anorher answer... * Update * ok, final answer with one regexp... `#!/usr/bin/perl -w use strict; my $test="<root><ta>this is <ta>the sample <ta>text</ta> explanation</ +ta> needded</ta></root>"; my @data; while ($test =~ m/(<ta>(.*)<\/ta>)/gs) { unshift (@data,$1); ($test) = $2; } foreach (@data) { print $_."\n"; }` [download]	[reply] [d/l]
Re: How to avoid the greedy matching failures?. by BUU (Prior) on Feb 03, 2005 at 06:44 UTC
This looks a lot like XML, so perhaps you want to use one of the prewritten XML libraries, such XML::Simple, XML::LibXML and stuff, which are all on cpan	[reply]
Re: How to avoid the greedy matching failures?. by blazar (Canon) on Feb 03, 2005 at 10:32 UTC
The short answer is that (it is well known that) regexen are not currently the best tool to parse *ML text. (No, I'm not talking about SML, OCAML et similia!) However non greedy pattern matching is described at perldoc perlre.	[reply]