comment on

That's the stupid "lalala I can't hear you" method. The problem is still there. Did you bother looking at line 1229 of C:/Perl64/site/lib/HTML/TableExtract.pm?

The current version 2.10 (dated 15 Jul 2006) at cpan.org has the following code around that line:

  sub strip {
    my $self = shift;
    $self->parse(shift);
    $self->eof;
    $self->{_htes_tidbit};
  }
[download]

Line 1229 is the call to the parse method. My guess is that strip is called as method without arguments or with an undefined argument.

In line 1196, you find "package HTML::TableExtract::StripHTML;", and in line 1201, you find @ISA = qw(HTML::Parser);. So, there is a helper class HTML::TableExtract::StripHTML inheriting from HTML::Parser. Does HTML::Parser implement a strip method that may be called without a defined argument? Or is the strip method called by HTML::TableExtract?

HTML::Parser does not document a strip method. But a simple and stupid search inside TableExtract.pm for the exact word "strip" shows two matches, the sub starting in line 1227, and a strip method call in line 625, "$target = $stripper->strip($item);". And lo and behold, the previous line 624 creates an instance of HTML::TableExtract::StripHTML: "my $stripper = HTML::TableExtract::StripHTML->new;".

So, strip is always called with an argument ($item), but that argument may become undefined. Looking up a few lines, you can see that $item is initialised by dereferencing $ref as a scalar, in line 622. Scalars can be undef, and no code around checks that condition. Looking up a few more lines, you can see that $ref is initialised from my $ref = $self->{grid}[$r][$c]; in line 617.

My guess (from variable names and the ROW: label in line 612) is that this part of the code iterates over a 2D-array in $self->{'grid'} that represents one or more HTML tables. But HTML tables may have empty cells (<td></td>), missing cells (at the end of table rows), and cells spanning rows and/or columns (rowspan and colspan attributes). Perhaps the HTML::TableExtract code can't handle those one or more of these conditions, and the array element becomes undefined.

Look at the HTML you feed to HTML::TableExtract. Is it invalid HTML? (Use the W3C validator to find out.) Does it contain empty table cells? Does it contain cells spanning rows and/or columns?

If the input is not valid HTML, fix it. If missing cells cause the problem, fix the input. If empty or spanning cells cause the problem, search for an existing bug, report a new bug if the problem is not yet known.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to Re^3: "Use of uninitialized value in subroutine entry" warning with HTML::TableExtract by afoken
in thread "Use of uninitialized value in subroutine entry" warning with HTML::TableExtract by OfficeLinebacker

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.