perlsen has asked for the wisdom of the Perl Monks concerning the following question:

Can someone explain how to match the nested element of same tag?.
I have faced problem, in the inner level data matching thru greedy matching
It results failure.
For example below I need to match the "<ta>...</ta>" tags in the following order
that is in the <root> tag the inner most <ta> tag elements should be processed first and
then the outer <ta> should match next then....

outputs:

<ta>text</ta><br> <ta>the sample <ta>text</ta> explanation</ta><br> <ta>this is <ta>the sample <ta>text</ta> explanation</ta> needded</ta> +<br> <p>for the<br> <p>inputs:<br> "<root><ta>this is <ta>the sample <ta>text</ta> explanation</ta> needd +ed</ta></root>"

Thanks in advance and suggestions are more welcome.

20050206: Unconsidered by Corion, was considered by jbrugger: change the tite to something like: match the contents of nested alike (same) tags; Keep/Edit/Delete: 9/15/0

Replies are listed 'Best First'.
Re: How to avoid the greedy matching failures?.
by murugu (Curate) on Feb 03, 2005 at 05:13 UTC
Re: How to avoid the greedy matching failures?.
by jbrugger (Parson) on Feb 03, 2005 at 05:03 UTC
    Mind the question mark in the regext to make it non-greedy:
    However, i wonder why you didn't search, this question is answered before.
    (Re^3: Pattern matching, How will my regular expression match?, Re: Regexp help, multiple lines )

    Woops! Now it's my fault, i did not read properly, i'll be back with anorher answer...

    *** Update ***
    ok, final answer with one regexp...
    #!/usr/bin/perl -w use strict; my $test="<root><ta>this is <ta>the sample <ta>text</ta> explanation</ +ta> needded</ta></root>"; my @data; while ($test =~ m/(<ta>(.*)<\/ta>)/gs) { unshift (@data,$1); ($test) = $2; } foreach (@data) { print $_."\n"; }
Re: How to avoid the greedy matching failures?.
by BUU (Prior) on Feb 03, 2005 at 06:44 UTC
    This looks a lot like XML, so perhaps you want to use one of the prewritten XML libraries, such XML::Simple, XML::LibXML and stuff, which are all on cpan
Re: How to avoid the greedy matching failures?.
by blazar (Canon) on Feb 03, 2005 at 10:32 UTC
    The short answer is that (it is well known that) regexen are not currently the best tool to parse *ML text. (No, I'm not talking about SML, OCAML et similia!)

    However non greedy pattern matching is described at perldoc perlre.