Keep It Simple, Stupid | |
PerlMonks |
Re: Regexps to change HTML tags/attributesby Ovid (Cardinal) |
on Aug 27, 2003 at 16:00 UTC ( [id://287074]=note: print w/replies, xml ) | Need Help?? |
As a general rule, don't use regular expressions to parse HTML. You typically want a parser. Here's a short example that will remove all anchor tags (beginning and ending) and also change font sizes (though you should really use CSS) and delete the "alt" attribute of images (which you also shouldn't do, but it's here as an example):
As a side note, if you want your HTML "cleaned up" a little bit, prior to the $html .= $token->as_is; line, add: $token->rewrite_tag;That will preserve and double-quote the values, automatically lowercase the tag name and attribute names (as they properly should be) and preserve an ending forward slash if it's used in a self closing tag:
This method is automatically called on tags that have attributes added, changed, or deleted. In other words, this is a very common task and HTML::TokeParser::Simple, version 2.1 does all of that for you and then some. Cheers, New address of my CGI Course.
In Section
Seekers of Perl Wisdom
|
|