Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I ran into a baffling problem today I'm hoping someone else has some enlightment to share.

I developed a module that is a subclass of HTML::Parser. It works like a charm on my Mac (OS X 1.2). When I move it to a Linux server nothing gets returned -- no error messages, no contents returned. In debugging why it fails in one place and not the other I found that the start, end and text handlers are not being called. The start document does seem to get called. The linux server that is failing is running 3.25. My Mac was running 3.28. I just upgraded to 3.34. The script works fine with both. Both my Mac and the Linux server are running Perl 5.6.1. I also tried another Linux server I had access to and it worked fine there -- version 3.26.

What's most baffling is that a not an error message is generated to give me a hint.

I don't see anything in the change history that would indicate a bug that is creating this occurance. Obviously upgrade to the latest version is a logical choice, but I'm doing this work for a client and may not have the option of upgrading. I thought it wise to look into it completely. I'm concerned that this issue will rear its ugly head later.

All thoughts appreciated. Thanks.

Replies are listed 'Best First'.
Re: HTML::Parser - no contents. no error.
by castaway (Parson) on Oct 29, 2003 at 08:58 UTC
    Have seen a similar problem to this using a HTML::Parser that was compiled for 5.8.0 with version 5.8.1. Make sure your HTML::Parser is actually compiled for the version you're trying to run it with? (Upgrade is probably a good idea though :)

    C.

      If this is indeed the problem, the cause of it is discussed in the first point on this P5P summary.
Re: HTML::Parser - no contents. no error.
by PodMaster (Abbot) on Oct 29, 2003 at 08:37 UTC
    Here are my thoughts
    1. Where is your minimal example which demonstrates the problem please? (too mysterious)
    2. Grab a copy of Devel::Trace
    3. Grab a copy of Devel::TraceMethods
    update: when you graby a copy of those Devel:: modules, be sure to give the docs a read and use them, mmkay ;)

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Thanks for the advice thus far. I applied Devel::Trace and Devel::TraceMethods. It seems that the parse method inherited from HTML::Parser is simply doing nothing -- none of the handlers are being called. Here is the relevant section Trace output. Text::Styler::Parser is a subclass of HTML::Parser. It's meant to text form HTML.

      >> test.pl:3: $Devel::Trace::TRACE=0; >> test.pl:26: print $styler->text_style($html); >> Text/Styler.pm:21: my($self,$text) = @_; >> Text/Styler.pm:23: my $styler = new Text::Styler::Parser; >> Text/Styler/Parser.pm:21: my $proto = shift; >> Text/Styler/Parser.pm:22: my $class = ref( $proto ) || $proto; >> Text/Styler/Parser.pm:23: my $parser = HTML::Parser->new( api_ve +rsion => 3 ); >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:23: my $clas +s = shift; >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:24: my $self + = bless {}, $class; >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:25: return $ +self->init(@_); >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:31: my $self + = shift; >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:32: $self->_ +alloc_pstate; >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:34: my %arg += @_; >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:35: my $api_ +version = delete $arg{api_version} || (@_ ? 3 : 2); >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:36: if ($api +_version >= 4) { >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:42: if ($api +_version < 3) { >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:65: if (my $ +h = delete $arg{handlers}) { >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:73: while (m +y($option, $val) = each %arg) { >> /usr/local/lib/site_perl/i386-linux/HTML/Parser.pm:86: return $ +self; >> Text/Styler/Parser.pm:25: $parser->handler(start => \&start_hand +ler, 'self, tagname, attr' ); >> Text/Styler/Parser.pm:26: $parser->handler(end => \&end_handler, + 'self, tagname' ); >> Text/Styler/Parser.pm:27: $parser->handler(text => \&text_handle +r, 'self, dtext' ); >> Text/Styler/Parser.pm:28: $parser->report_tags( @report_tags ); >> Text/Styler/Parser.pm:29: $parser->unbroken_text(1); >> Text/Styler/Parser.pm:31: return bless $parser, $class; >> Text/Styler.pm:24: $styler->parse($text); >> Text/Styler.pm:27: $self->text_wrap( $styler->contents );

      TraceMethods reports something similar.

      parseText::Styler::Parser=HASH(0x80f5f7c)<p>Little <q>help</q> and lov +e key the i to i'll tune get me help i i.</p> <p>Walk friends sad, you. <br />My by up the my little, a a, and my i, + with <a href="http://tima.mplode.com/">you high</a> you're. By from +when walk would what my help from my, by, when. Your up, be own with +would is me up your you're friends does to of by and my how ears. You + if little with out i own, worry you, of are, little, not get, i sing + sad key. Stand help get with get be what are walk my out a of song, +on by would i a a. Away think think, because me not . Tune you with m +e a sing the how i'll song sang sang on your, out, i. Little, help he +lp and love key the i to i'll tune get me help i i, of. The <b>sad, you does by</b> up the my little, a.</p> <pre>xxxx foo foo</pre> at Text/Styler.pm line 24

      Here is that same section when run on my Mac.

      parseText::Styler::Parser=HASH(0xc63d4)<p>Little <q>help</q> and love +key the i to i'll tune get me help i i.</p> <p>Walk friends sad, you. <br />My by up the my little, a a, and my i, + with <a href="http://tima.mplode.com/">you high</a> you're. By from +when walk would what my help from my, by, when. Your up, be own with +would is me up your you're friends does to of by and my how ears. You + if little with out i own, worry you, of are, little, not get, i sing + sad key. Stand help get with get be what are walk my out a of song, +on by would i a a. Away think think, because me not . Tune you with m +e a sing the how i'll song sang sang on your, out, i. Little, help he +lp and love key the i to i'll tune get me help i i, of. The <b>sad, you does by</b> up the my little, a.</p> <pre>xxxx foo foo</pre> at Text/Styler.pm line 24 start_document_handlerText::Styler::Parser=HASH(0xc63d4) at Text/Style +r.pm line 24 start_handlerText::Styler::Parser=HASH(0xc63d4)pHASH(0xc72e4) at Text/ +Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)Little at Text/Styler.p +m line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 start_handlerText::Styler::Parser=HASH(0xc63d4)qHASH(0xc7350) at Text/ +Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)help at Text/Styler.pm l +ine 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 end_handlerText::Styler::Parser=HASH(0xc63d4)q at Text/Styler.pm line +24 text_handlerText::Styler::Parser=HASH(0xc63d4) and love key the i to i +'ll tune get me help i i. at Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 end_handlerText::Styler::Parser=HASH(0xc63d4)p at Text/Styler.pm line +24 text_handlerText::Styler::Parser=HASH(0xc63d4) at Text/Styler.pm line 24 start_handlerText::Styler::Parser=HASH(0xc63d4)pHASH(0xc7338) at Text/ +Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)Walk friends sad, you. +at Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 start_handlerText::Styler::Parser=HASH(0xc63d4)brHASH(0xc7314) at Text +/Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)My by up the my little, +a a, and my i, with at Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 start_handlerText::Styler::Parser=HASH(0xc63d4)aHASH(0xc738c) at Text/ +Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)you high at Text/Styler. +pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 end_handlerText::Styler::Parser=HASH(0xc63d4)a at Text/Styler.pm line +24 text_handlerText::Styler::Parser=HASH(0xc63d4) you're. By from when wa +lk would what my help from my, by, when. Your up, be own with would i +s me up your you're friends does to of by and my how ears. You if lit +tle with out i own, worry you, of are, little, not get, i sing sad ke +y. Stand help get with get be what are walk my out a of song, on by w +ould i a a. Away think think, because me not . Tune you with me a sin +g the how i'll song sang sang on your, out, i. Little, help help and +love key the i to i'll tune get me help i i, of. The at Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 start_handlerText::Styler::Parser=HASH(0xc63d4)bHASH(0xc7398) at Text/ +Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)sad, you does by at Text +/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 end_handlerText::Styler::Parser=HASH(0xc63d4)b at Text/Styler.pm line +24 text_handlerText::Styler::Parser=HASH(0xc63d4) up the my little, a. at + Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 end_handlerText::Styler::Parser=HASH(0xc63d4)p at Text/Styler.pm line +24 text_handlerText::Styler::Parser=HASH(0xc63d4) at Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 start_handlerText::Styler::Parser=HASH(0xc63d4)preHASH(0xc735c) at Tex +t/Styler.pm line 24 text_handlerText::Styler::Parser=HASH(0xc63d4)xxxx foo foo at Text/Styler.pm line 24 current_elementText::Styler::Parser=HASH(0xc63d4) at Text/Styler/Parse +r.pm line 71 end_handlerText::Styler::Parser=HASH(0xc63d4)pre at Text/Styler.pm lin +e 24

      As I mentioned, what I find odd is that there is not an error message in sight. The box in which this is failing has been running the same version of Perl (5.6.1) for quite some time. I've written other HTML::Parser modules before on this box and not had an issue. Before I request to have this module updated on the system -- any additional thoughts are appreciated. Thanks for wisdom thus far.

Re: HTML::Parser - no contents. no error.
by Anonymous Monk on Oct 29, 2003 at 16:37 UTC

    I did some more expertimenting on the box where this is not operating properly. If I do not set the report tags the handlers are called as advertised. So the work around is to manage which tags to process in my handlers instead of relying on the HTML::Parser's function. Its sort of shame but ...

    Is this a known bug or perhaps something specific to this server. the change log doesn't seem to indicate anything and I've yet to turn up a reference in my Google searches.

    Thoughts?

      Here are some more thoughts
      • Does HTML::Parser pass its own testsuite on the troubled machine?
      • Why don't you post some code (with sample input) already?
        Through testing I have come to the conclusion that report_tags is not cumulative, and it works by reporting only those tags you ask for, ignoring all else, and that ignore_tags has precedence over report_tags(that is if a tag is ignored, and you also want to report it, it won't be reported).

      MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
      I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
      ** The third rule of perl club is a statement of fact: pod is sexy.

        I don't know if HTML::Parser passes its own test suite. I don't have rights and the adminstrator is hard to reach. I'm working on that. I did noticed that they must have recompiled Perl back in August when I did a -V.

        As for code, I've been hesistant because there is quite a bit. I did cut it down into it relevant parts. The report_tags call is commented out. Here it is:

        #!/usr/bin/perl -w use Text::Styler; # use Devel::TraceMethods qw ( HTML::Parser Text::Styler::Parser ); my $styler = Text::Styler->new( { left_margin=>5, right_margin=>2, col +umns=>70 } ); my $html = <<OUTPUT; <p>Little <q>help</q> and love key the i to i'll tune get me help i i. +</p> <p>Walk friends sad, you. <br />My by up the my little, a a, and my i, + with <a href="http://www.foo.com/">you high</a> you're. By from when + walk would what my help from my, by, when. Your up, be own with woul +d is me up your you're friends does to of by and my how ears. You if +little with out i own, worry you, of are, little, not get, i sing sad + key. Stand help get with get be what are walk my out a of song, on b +y would i a a. Away think think, because me not . Tune you with me a +sing the how i'll song sang sang on your, out, i. Little, help help a +nd love key the i to i'll tune get me help i i, of. The <b>sad, you does by</b> up the my little, a.</p> <pre>xxxx foo foo</pre> OUTPUT print $styler->text_style($html)."\n"; ---- package Text::Styler; use strict; use Text::Wrap; use vars qw( $VERSION ); $VERSION = 0.1; sub new { my $class = shift; my $self = bless { }, $class; if ( my $a = shift ) { # getting set wrong when hash ref is handed + in. $self->left_margin( $a->{left_margin} || 0 ); $self->right_margin( $a->{right_margin} || 0 ); $self->columns( $a->{columns} || 60 ); } return $self; } sub text_style { my($self,$text) = @_; use Text::Styler::Parser; my $styler = new Text::Styler::Parser; $styler->parse($text); $self->text_wrap( $styler->contents ); } ---- package Text::Styler::Parser; use strict; use HTML::Parser; our @ISA = qw( HTML::Parser ); use vars qw( $VERSION ); $VERSION = 0.1; my %inline_tags = ( 'a' => '[', 'b' => '*', 'strong' => '*', 'i' => '\\', 'em' => '\\', 'q' => '"' ); my %block_tags = ( 'p' => "\n", 'pre' => '' ); my %empty_tags = ( 'br' => '1' ); my @report_tags = (keys %inline_tags, keys %block_tags, keys %empty_ta +gs); my $tag_symbols = { %inline_tags, %block_tags }; # not too concerned a +bout saving memory right now my $tag_handlers= { 'a' => \&hdlr_hyperlink, 'p' => \&hdlr_paragrap +h, 'br' => \&hdlr_br, 'pre' => \&hdlr_pre }; sub new { my $proto = shift; my $class = ref( $proto ) || $proto; my $parser = HTML::Parser->new( api_version => 3 ); $parser->handler(start_document => \&start_document_handler, 'self +' ); $parser->handler(start => \&start_handler, 'self, tagname, attr' ) +; $parser->handler(end => \&end_handler, 'self, tagname' ); $parser->handler(text => \&text_handler, 'self, dtext' ); #$parser->report_tags( qw( a b strong i em q p pre br ) ); #$parser->report_tags( qw( @report_tags ) ); $parser->unbroken_text(1); return bless $parser, $class; } sub contents { $_[0]->{_output}; } sub start_document_handler { $_[0]->{_stack}=undef; $_[0]->{_output}=u +ndef; } sub start_handler { my $self = shift; my $tag = shift; my $attr = shift; push( @{ $self->{_stack} }, [ $tag, $tag_symbols->{$tag} ] ); unless( defined( $tag_handlers->{$tag} ) && $tag_handlers->{$tag}->($self,'1',$tag_symbols->{$tag},$attr) +) { $self->{_output} .= length( $tag_symbols->{$tag} ) ? ' '.$tag_symbols->{$tag} : $tag_symbols->{$tag}; # default start handler } } sub end_handler { my $self = shift; my $tag = shift; unless( defined( $tag_handlers->{$tag} ) && $tag_handlers->{$tag}->($self,'-1',$tag_symbols->{$tag}) ) { if ( defined( $block_tags{$tag} ) ) { $self->{_output} .= "\n\n"; } elsif ( defined( $inline_tags{$tag} ) ) { $self->{_output} .= length( $tag_symbols->{$tag} ) ? $tag_symbols->{$tag}.' ' : $tag_symbols->{$tag}; # default end handler } # empty tags can only have start tag routines } pop( @{ $self->{_stack} } ); } sub text_handler { my $self = shift; my $text = shift; if ( $self->{_stack}->[0] ) { # filters out text outside a tag. $self->{_current_text}=$text; # a kludge. my $tag = $self->current_element; unless( defined($tag) && defined( $tag_handlers->{$tag} ) && defined( $tag_symbols->{$tag}) && $tag_handlers->{$tag}->($self,'0',$tag_symbols->{$tag} +) ) { $self->{_current_text}=~s/\r//sg; $self->{_current_text}=~s/^[\s\t]*//gm; $self->{_current_text}=~s/[\s\t]*$//gm; $self->{_current_text}=~s/\n/ /gs; } $self->{_output} .= $self->{_current_text}; $self->{_current_text} = undef; } } #--- tag handlers sub hdlr_hyperlink { my ($self, $mode, $symbol, $attr) = @_; if ($mode == 1) { $self->{_output}.=' '; $self->{_current_link} .= $attr->{'href'}; # add alt storage here. return 1; } elsif ($mode == -1) { $self->{_output} .= ' ['.$self->{_current_link}.'] '; $self->{_current_link}=undef; # blank alt storage here. return 1; } return 0; } sub hdlr_paragraph { my ($self, $mode, $symbol, $attr) = @_; if ( $mode == 1) { # pass through. return 1; } return 0; } sub hdlr_br { my ($self, $mode, $symbol, $attr) = @_; if ($mode == 1) { $self->{_output}.="\n"; } return 1; # This handles it or we ignore it. } sub hdlr_pre { my ($self, $mode, $tag, $attr) = @_; if ( $mode==0 ) { # pass through. return 1; } return 0; }