Perl pattern matching

Rahul205 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl pattern matching by merlyn (Sage) on Apr 13, 2009 at 15:53 UTC
See the first example of HTML::Filter. It's a comment stripper, so all you need to do is reverse that. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Perl pattern matching by CountZero (Bishop) on Apr 13, 2009 at 17:35 UTC
Do not use regexes for this job. You will get bitten by all the edge-cases. One possible solution uses the XML::XPath-module and uses XPath expressions to traverse the xml-file. `/` selects the document root `descendant-or-self::` selects all nodes in or below the root node (which was set using `/`) The node test `comment()` is true for any comment node. `/descendant-or-self::comment()` therefore selects all comment nodes in or under the root node, i.e. all comment nodes in the whole file. use strict; use warnings; use XML::XPath; my $xml = q\|<?xml version="1.0" ?> <xml><!-- A single line comment --> <class_list> <student> <name>Robert</name> <!-- A comment deeply inside the file --> <grade>A+</grade> </student> <!-- Here starts a multi line comment <student> <name>Lenard</name> <grade>A-</grade> </student> --> </class_list> </xml> \|; my $xp = XML::XPath->new(xml => $xml); my $nodeset = $xp->find('/descendant-or-self::comment()'); foreach my $node ($nodeset->get_nodelist) { print "FOUND\n", $node->getValue, "\n"; } [download] Output: `FOUND A single line comment FOUND A comment deeply inside the file FOUND Here starts a multi line comment <student> <name>Lenard</name> <grade>A-</grade> </student>` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]