in reply to "Use of uninitialized value in subroutine entry" warning with HTML::TableExtract

How do I fix it?

Stop using #!/usr/bin/perl -w , and use warnings instead, see perllexwarn

  • Comment on Re: "Use of uninitialized value in subroutine entry" warning with HTML::TableExtract

Replies are listed 'Best First'.
Re^2: "Use of uninitialized value in subroutine entry" warning with HTML::TableExtract
by OfficeLinebacker (Chaplain) on Aug 05, 2011 at 14:08 UTC
    LOL thanks! changing the first three lines to
    #!/usr/bin/perl use strict; use warnings;
    did the trick.

    I like computer programming because it's like Legos for the mind.
      In my template I have #!/usr/bin/perl --, that way I'm immune to line endings ( dos2unix/unix2dos)

      That's the stupid "lalala I can't hear you" method. The problem is still there. Did you bother looking at line 1229 of C:/Perl64/site/lib/HTML/TableExtract.pm?

      The current version 2.10 (dated 15 Jul 2006) at cpan.org has the following code around that line:

      sub strip { my $self = shift; $self->parse(shift); $self->eof; $self->{_htes_tidbit}; }

      Line 1229 is the call to the parse method. My guess is that strip is called as method without arguments or with an undefined argument.

      In line 1196, you find "package HTML::TableExtract::StripHTML;", and in line 1201, you find @ISA = qw(HTML::Parser);. So, there is a helper class HTML::TableExtract::StripHTML inheriting from HTML::Parser. Does HTML::Parser implement a strip method that may be called without a defined argument? Or is the strip method called by HTML::TableExtract?

      HTML::Parser does not document a strip method. But a simple and stupid search inside TableExtract.pm for the exact word "strip" shows two matches, the sub starting in line 1227, and a strip method call in line 625, "$target = $stripper->strip($item);". And lo and behold, the previous line 624 creates an instance of HTML::TableExtract::StripHTML: "my $stripper = HTML::TableExtract::StripHTML->new;".

      So, strip is always called with an argument ($item), but that argument may become undefined. Looking up a few lines, you can see that $item is initialised by dereferencing $ref as a scalar, in line 622. Scalars can be undef, and no code around checks that condition. Looking up a few more lines, you can see that $ref is initialised from my $ref = $self->{grid}[$r][$c]; in line 617.

      My guess (from variable names and the ROW: label in line 612) is that this part of the code iterates over a 2D-array in $self->{'grid'} that represents one or more HTML tables. But HTML tables may have empty cells (<td></td>), missing cells (at the end of table rows), and cells spanning rows and/or columns (rowspan and colspan attributes). Perhaps the HTML::TableExtract code can't handle those one or more of these conditions, and the array element becomes undefined.

      Look at the HTML you feed to HTML::TableExtract. Is it invalid HTML? (Use the W3C validator to find out.) Does it contain empty table cells? Does it contain cells spanning rows and/or columns?

      If the input is not valid HTML, fix it. If missing cells cause the problem, fix the input. If empty or spanning cells cause the problem, search for an existing bug, report a new bug if the problem is not yet known.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        That's the stupid "lalala I can't hear you" method.

        Nope, it is your basic separation of concerns.

        OfficeLinebacker did not write HTML::TableExtract, so there is no reason for him to turn on warnings for a module he did not write, regardless of any potential bugs in HTML::TableExtract.

        If HTML::TableExtract author wanted warnings, surely he would have added use warnings;

        If the input is not valid HTML, fix it.

        I doubt OfficeLinebacker has any control over third party html, but it doesn't matter, HTML::TableExtract gets him the data hes after, regardless of any warnings.

        I must admit, I just care if it works, and anon is right, I'm not parsing HTML that I wrote myself. In any case, it works great now, huzzah!

        I like computer programming because it's like Legos for the mind.