szabgab has asked for the wisdom of the Perl Monks concerning the following question:

Using HTML:Parser it is unclear to me how am I supposed to notice when a tag - that's end tag is missing has indeed ended? It seems that in some cases I get an explicit end event but in other cases I don't.

See the example code:

use strict; use warnings; use HTML::Parser (); sub event_handler { my ($event, $elem) = @_; print "$event $elem\n"; } my $p = HTML::Parser->new(api_version => 3); $p->handler( start => \&event_handler, "event, tagname"); $p->handler( end => \&event_handler, "event, tagname"); $p->parse('<head><title>abc</title></head>'); $p->eof; print "----\n"; $p->parse('<head><title>abc</head>'); $p->eof; print "----\n"; $p->parse('<ul><li>abc</li><li>def</ul>'); $p->eof; exit;
The result of which is
start head start title end title end head ---- start head start title end title end head ---- start ul start li end li start li end ul
That is, the missing </title> tag explicitly generated and end-even while the missing </li> did not.

Replies are listed 'Best First'.
Re: HTML::Parser explicit calls for missing end tags
by Anonymous Monk on Aug 20, 2010 at 10:10 UTC
    closing li tag is optional , but this could be considered a bug .... you might want to look how HTML::TreeBuilder handles this