Do not use regexes for this job. You will get bitten by all the edge-cases.One possible solution uses the XML::XPath-module and uses XPath expressions to traverse the xml-file. - / selects the document root
- descendant-or-self:: selects all nodes in or below the root node (which was set using /)
- The node test comment() is true for any comment node.
/descendant-or-self::comment() therefore selects all comment nodes in or under the root node, i.e. all comment nodes in the whole file. use strict;
use warnings;
use XML::XPath;
my $xml = q|<?xml version="1.0" ?>
<xml><!-- A single line comment -->
<class_list>
<student>
<name>Robert</name>
<!-- A comment deeply inside the file -->
<grade>A+</grade>
</student>
<!-- Here starts a multi line comment
<student>
<name>Lenard</name>
<grade>A-</grade>
</student>
-->
</class_list>
</xml>
|;
my $xp = XML::XPath->new(xml => $xml);
my $nodeset = $xp->find('/descendant-or-self::comment()');
foreach my $node ($nodeset->get_nodelist) {
print "FOUND\n",
$node->getValue,
"\n";
}
Output:FOUND
A single line comment
FOUND
A comment deeply inside the file
FOUND
Here starts a multi line comment
<student>
<name>Lenard</name>
<grade>A-</grade>
</student>
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
|