The problem with embded languages (like PHP, CF, etc) which use tag-like syntax is that:
- They can be used with non-HTML documents which can confuse "normal" HTML parsers
- Even for HTML documents structure of HTML document doesn't necessary matches structure of embded language pseudo-tags tree. That is pseudo-tag can be inside HTML tag, it can cross boundaries of HTML tags.
- HTML tags can be generated by pseudo-tags. In this case input document can often look seriously broken to "normal" HTML parser.
Proper parser for embeded language should ignore all HTML markup (or any other markup, or any text which looks like markup). It should take in account only its pseudo-tags. Is it possible to make
HTML::Parser ignore everything except pseudo-tags? I don't think so but I can be wrong.
--
Ilya Martynov
(http://martynov.org/)