in reply to Not able to Matching nested item
In your code, I think the problem is the use of [^>]+?> in a couple of places. Try using .*?> instead.
Update: the above might be the issue, but after further reflection I am not completely sure that will fix it. But read on...
What you are trying to do is similar to matching nested parentheses, and that's notoriously difficult to do with regular expressions. Moreover, if you run into a CLG.MDFO PI, you need to look at the next element tag, and if you run into a CLG.MDFC PI, you need to parse the preceding element tag.
I would try this approach which uses simpler regexs and maintains a stack of the parsed PI's:
Update: To handle this case:my @stack; while ($file =~ m{\G(.*)<\?(.*)\?>}gms) { my $pre = $1; my $pi = $1; my @args = split(' ', $pi); # hopefully this always works my $pi_cmd = uc($args[0]); if ($pi_cmd eq 'CLG.MDFO') { # parse next element tag if ($file =~ m{\G\s*<\s*([^>\s]*?)(.*?)>}gms) { my $element = $1; push(@stack, $element); } } elsif ($pi_cmd eq 'CLG.MDFC') { # parse previous element tag if ($pre =~ m{<\s*([^>\s]*?)([^>]*)>\s*\z}ms) { my $element = $1; unless ($element eq pop(@stack)) { ...emit mismatch warning ... } } } } if (@stack) { ...emit unterminated CLG.MDFO warning... }
youu can modify the above code as follows:</ARTICLE> <?no_smark?> <?CLG.MDFC ID="C001001M00 ...
So encountering <? no_smark ?> will set $pre_element for the following CLG.MDFC pi.my $pre_element; my @stack; while ($file =~ m{...regex for a pi...}) { my $pre = $1; ... if ($pi_cmd eq 'CLG.MDFO') { ...same as above... } else { if ($pre =~ m{...regex for element tag...\z}) { $pre_element = $1; } if ($pi_cmd = 'CLG.MDFC') { unless ($pre_element = pop(@stack)) { ... } } } }
|
|---|