if ($XML_process_line =~ /^(\d{1,10})([\%|\<].{1,1000}\>)/){
I don't think it does what you want. First of all, this: [\%|\<]. You use [] and |, I think you want one or the other. If you want to alternate between % and <, use [\%<].
Second, and more important, this: .{1,1000}>. Perl's regexes are greedy, that means that if you do this: "<one></one>" =~ /<(.{1,1000})>/;print $1;, you're going to get one></one printed, because it matches as many characters as possible before stopping.
Instead, you'd want to use [\%|\<][^>]{1,1000}>, which uses a negative character class.
That being said, this is hard to do and even harder to do right, so you should use a module. I would suggest XML::Twig, but there plenty of others as well.
elusion : http://matt.diephouse.com
Update: I also noticed that you use two variables for your line. You assign to $XML_line, but use your regex on $XML_process_line. Remember to use -w and strict.
In reply to Re: Removing duplicate subtrees from XML
by elusion
in thread Removing duplicate subtrees from XML
by matth
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |