in reply to Not able to Matching nested item

Using a real XML parser is a better way to go.

In your code, I think the problem is the use of [^>]+?> in a couple of places. Try using .*?> instead.

Update: the above might be the issue, but after further reflection I am not completely sure that will fix it. But read on...

What you are trying to do is similar to matching nested parentheses, and that's notoriously difficult to do with regular expressions. Moreover, if you run into a CLG.MDFO PI, you need to look at the next element tag, and if you run into a CLG.MDFC PI, you need to parse the preceding element tag.

I would try this approach which uses simpler regexs and maintains a stack of the parsed PI's:

my @stack; while ($file =~ m{\G(.*)<\?(.*)\?>}gms) { my $pre = $1; my $pi = $1; my @args = split(' ', $pi); # hopefully this always works my $pi_cmd = uc($args[0]); if ($pi_cmd eq 'CLG.MDFO') { # parse next element tag if ($file =~ m{\G\s*<\s*([^>\s]*?)(.*?)>}gms) { my $element = $1; push(@stack, $element); } } elsif ($pi_cmd eq 'CLG.MDFC') { # parse previous element tag if ($pre =~ m{<\s*([^>\s]*?)([^>]*)>\s*\z}ms) { my $element = $1; unless ($element eq pop(@stack)) { ...emit mismatch warning ... } } } } if (@stack) { ...emit unterminated CLG.MDFO warning... }
Update: To handle this case:
</ARTICLE> <?no_smark?> <?CLG.MDFC ID="C001001M00 ...
youu can modify the above code as follows:
my $pre_element; my @stack; while ($file =~ m{...regex for a pi...}) { my $pre = $1; ... if ($pi_cmd eq 'CLG.MDFO') { ...same as above... } else { if ($pre =~ m{...regex for element tag...\z}) { $pre_element = $1; } if ($pi_cmd = 'CLG.MDFC') { unless ($pre_element = pop(@stack)) { ... } } } }
So encountering <? no_smark ?> will set $pre_element for the following CLG.MDFC pi.