OfficeLinebacker has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, esteemed monks!

After a long time away, I'm using Perl again. I'm having an odd error I can't figure out with my program and I don't see anything about it in the other HTML:TableExtract threads. Basically, whenever I enable keep_html in my constructor, I get a series of these warnings. I've looked through the module code but I don't really get what's going on. How do I fix it? Without further ado, my code:

#!/usr/bin/perl -w use strict; use LWP::UserAgent; use Readonly; use HTML::TreeBuilder; use HTML::TableExtract; use HTML::Encoding 'encoding_from_http_message'; use Encode; Readonly::Scalar my $url => 'https://ebidmarketplace.com/publicVenSolL +ist.asp'; Readonly::Scalar my $params1 => '?selAgency=&selbuyercd=&mDueDateFrom= +&mDueDateTo=&docno=&fiscalyr=&chgordseq=&Agency=&buyerCode=&'; Readonly::Scalar my $params2 => 'agencyName=&buyerName=&selShowRows=99 +99&selsortby=POST_DATE&mStatus=0&changeind=01&curPage=1'; Readonly::Scalar my $fullurl => $url.$params1.$params2; # POST /publicVenSolList.asp selAgency=&selbuyercd=&mDueDateFrom=08%2F +04%2F2011&mDueDateTo=09%2F03%2F2011&docno=&fiscalyr=&chgordseq=&Agenc +y=&buyerCode=&agencyName=&buyerName=&selShowRows=500&selsortby=DUE_DA +TE&mStatus=0&changeind=01&curPage=1 my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get($fullurl); if ($response->is_success) { #print $response->decoded_content; # or whatever } else { die $response->status_line; } #my $te = HTML::TableExtract->new(keep_html => 1, keep_headers => 1, +slice_columns=> 0, strip_html_on_match => 1, headers => ["Solicitatio +n#"], debug => 9); my $te = HTML::TableExtract->new(keep_headers => 1, slice_columns=> 0 +, keep_html => 1, headers => ["Solicitation#"]); #my $te = HTML::TableExtract->new(); print "before parse\n"; $te->parse($response->decoded_content); print "after parse\n"; # Examine all matching tables foreach my $ts ($te->tables) { print "Table found at ", join(',', $ts->coords), ":\n"; print "Table (", join(',', $ts->coords), "):\n"; my $i=0; foreach my $row ($ts->rows) { $i++; if ($row->[0]){ #print join(',', @$row), "\n"; }else{ print "Row variable is empty at row $i\n"; } } }
Output:
Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. Use of uninitialized value in subroutine entry at C:/Perl64/site/lib/H +TML/TableExtract.pm line 1229. before parse after parse Table found at 1,5: Table (1,5):
I can make the warning go away by taking out the "keep_html => 1," part of the constructor. The program does work. Thanks!

I like computer programming because it's like Legos for the mind.

Replies are listed 'Best First'.
Re: "Use of uninitialized value in subroutine entry" warning with HTML::TableExtract
by Anonymous Monk on Aug 05, 2011 at 13:40 UTC
      LOL thanks! changing the first three lines to
      #!/usr/bin/perl use strict; use warnings;
      did the trick.

      I like computer programming because it's like Legos for the mind.
        In my template I have #!/usr/bin/perl --, that way I'm immune to line endings ( dos2unix/unix2dos)

        That's the stupid "lalala I can't hear you" method. The problem is still there. Did you bother looking at line 1229 of C:/Perl64/site/lib/HTML/TableExtract.pm?

        The current version 2.10 (dated 15 Jul 2006) at cpan.org has the following code around that line:

        sub strip { my $self = shift; $self->parse(shift); $self->eof; $self->{_htes_tidbit}; }

        Line 1229 is the call to the parse method. My guess is that strip is called as method without arguments or with an undefined argument.

        In line 1196, you find "package HTML::TableExtract::StripHTML;", and in line 1201, you find @ISA = qw(HTML::Parser);. So, there is a helper class HTML::TableExtract::StripHTML inheriting from HTML::Parser. Does HTML::Parser implement a strip method that may be called without a defined argument? Or is the strip method called by HTML::TableExtract?

        HTML::Parser does not document a strip method. But a simple and stupid search inside TableExtract.pm for the exact word "strip" shows two matches, the sub starting in line 1227, and a strip method call in line 625, "$target = $stripper->strip($item);". And lo and behold, the previous line 624 creates an instance of HTML::TableExtract::StripHTML: "my $stripper = HTML::TableExtract::StripHTML->new;".

        So, strip is always called with an argument ($item), but that argument may become undefined. Looking up a few lines, you can see that $item is initialised by dereferencing $ref as a scalar, in line 622. Scalars can be undef, and no code around checks that condition. Looking up a few more lines, you can see that $ref is initialised from my $ref = $self->{grid}[$r][$c]; in line 617.

        My guess (from variable names and the ROW: label in line 612) is that this part of the code iterates over a 2D-array in $self->{'grid'} that represents one or more HTML tables. But HTML tables may have empty cells (<td></td>), missing cells (at the end of table rows), and cells spanning rows and/or columns (rowspan and colspan attributes). Perhaps the HTML::TableExtract code can't handle those one or more of these conditions, and the array element becomes undefined.

        Look at the HTML you feed to HTML::TableExtract. Is it invalid HTML? (Use the W3C validator to find out.) Does it contain empty table cells? Does it contain cells spanning rows and/or columns?

        If the input is not valid HTML, fix it. If missing cells cause the problem, fix the input. If empty or spanning cells cause the problem, search for an existing bug, report a new bug if the problem is not yet known.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)