kevind0718 has asked for the wisdom of the Perl Monks concerning the following question:
Hello Wise Perl Monks:
Here I am again asking for your kind assistance.
For a home "non-commerical" project I am attempting to scrape data from here:
http://www.pro-football-reference.com/boxscores/
My code is below.
Of the html returned from the website I want to parse this table
When I do I get this error<table class="sortable stats_table float_left margin_right" id="game_ +info"> <tr class='thead'><th colspan=2>Game Info</th></tr><tr class=""> <td align="" ><b>Stadium</b></td> <td align="" >Hubert H. Humphrey Metrodome (dome)</td> </tr> <tr class=""> <td align="" ><b>Start Time</b></td> <td align="" >12:00pm</td> </tr> <tr class=""> <td align="" ><b>Surface</b></td> <td align="" >astroturf</td> </tr> <tr class=""> <td align="" ><b>Weather</b></td> <td align="" >72 degrees, no wind</td> </tr> <tr class=""> <td align="" ><b>Vegas Line</b></td> <td align="" >San Francisco 49ers <a href='/play-index/tgl_finder.c +gi?request=1&match=season&year_min=1985&year_max=1985&game_type=R&gam +e_num_min=0&game_num_max=99&week_num_min=0&week_num_max=99&game_day_o +f_week=&game_time=&time_zone=&game_location=&game_result=&overtime=&l +eague_id=&team_id=&opp_id=&conference_game=&division_game=&tm_is_play +off=&opp_is_playoff=&tm_is_winning=&opp_is_winning=&tm_scored_first=& +tm_led=&tm_trailed=&c1stat=favored_by&c1comp=eq&c1val=11'>-11.0</a></ +td> </tr> <tr class=""> <td align="" ><b>Over/Under</b></td> <td align="" >46.0 <b>(over)</b></td> </tr> </table>
I have gotten HTML::FormatText::WithLinks to work for a couple of other tables within websites. But in this case it fails. The Perl code in HTML::FormatText::WithLinks is beyond me. I can not debug through it. I am hoping that one of you wise monks would that a crack at it. And either tell me what I am doing wrong or suggest a bug fix.Can't call method "content" on an undefined value at C:/Perl64/site/li +b/HTML/FormatText/WithLinks/AndTables.pm line 217. at C:/Perl64/site/lib/HTML/FormatText/WithLinks/AndTables.pm line 217 HTML::FormatText::WithLinks::AndTables::_format_tables('HTML::Form +atText::WithLinks::AndTables=HASH(0x4325450)', 'HTML::TreeBuilder=HAS +H(0x4326a80)') called at C:/Perl64/site/lib/HTML/FormatText/WithLinks +/AndTables.pm line 101 HTML::FormatText::WithLinks::AndTables::parse('HTML::FormatText::W +ithLinks::AndTables=HASH(0x4325450)', '<table class="sortable stats_ +table float_left margin_right" ...') called at C:/Perl64/site/lib/HTM +L/FormatText/WithLinks/AndTables.pm line 83 HTML::FormatText::WithLinks::AndTables::convert('HTML::FormatText: +:WithLinks::AndTables', '<table class="sortable stats_table float_le +ft margin_right" ...') called at C:/Users/kbd0718/workspace/testPerl/ +testGetProFootballBox.pl line 82
use strict; use warnings; use Data::Dumper; use HTML::FormatText::WithLinks::AndTables; use IO::File; use LWP::Simple; my %teamCodes; $teamCodes{"ATL"} = "atl"; ## Atlanta Falcons $teamCodes{"CHI"} = "chi"; ## Chicago Bears $teamCodes{"CIN"} = "cin"; ## Cincinnati Bengals $teamCodes{"CLE"} = "cle"; ## Cleveland Browns $teamCodes{"BUF"} = "buf"; ## Buffalo Bills $teamCodes{"DAL"} = "dal"; ## Dallas Cowboys $teamCodes{"DEN"} = "den"; ## Denver Broncos $teamCodes{"DET"} = "det"; ## Detroit Lions $teamCodes{"GNB"} = "gnb"; ## Green Bay Packers $teamCodes{"HOO"} = "hoo|oti"; ## Houston Oilers $teamCodes{"IND"} = "ind|clt"; ## Indianapolis Colts $teamCodes{"NYJ"} = "nyj"; ## New York Jets $teamCodes{"KAN"} = "kan"; ## Kansas City Chiefs $teamCodes{"LAM"} = "lam|ram"; ## Los Angeles Rams $teamCodes{"LAD"} = "lad|rai"; ## Los Angeles Raiders $teamCodes{"MIA"} = "mia"; ## Miami Dolphins $teamCodes{"MIN"} = "min" ; ## Minnesota Vikings $teamCodes{"NYG"} = "nyg" ; ## New York Giants $teamCodes{"NWE"} = "nwe" ; ## New England Patriots $teamCodes{"NOR"} = "nor"; ## New Orleans Saints $teamCodes{"PHI"} = "phi"; ## Philadelphia Eagles $teamCodes{"PIT"} = "pit"; ## Pittsburgh Steelers $teamCodes{"SEA"} = "sea"; ## Seattle Seahawks $teamCodes{"SDG"} = "sdg"; ## San Diego Chargers $teamCodes{"SFO"} = "sfo"; ## San Francisco 49ers $teamCodes{"SLC"} = "slc|crd"; ## St. Louis Cardinals $teamCodes{"TAM"} = "tam"; ## Tampa Bay Buccaneers $teamCodes{"WAS"} = "was"; ## Washington Redskins my $date1 = "198509080"; my $date2 = "198509090"; my $tKey; my $link ; my $abbriv; my $urlBase = "http://www.pro-football-reference.com/boxscores/"; my $webPageText ; my @teamCode ; my $delimiter = quotemeta("|" ); my $startGameInfo ; my $startGameInfoTbl; my $endGameInfoTbl; my $gameInfoTbl ; while ( ($tKey, $abbriv) = each %teamCodes) { @teamCode = split( /$delimiter/, $abbriv ) ; print "$teamCode[0] \n"; } while ( ($tKey, $abbriv) = each %teamCodes) { @teamCode = split( /$delimiter/, $abbriv ) ; $link = $urlBase . $date1. $teamCode[0] . ".htm" ; print $link; $webPageText = get( $link ) or print "failed on retrieve of + web page\n"; if (index( $webPageText, "File Not Found") > 0 ) { print " failed on retrieve of web page\n"; } else { print "\n$webPageText\n\n"; if ( $startGameInfo = index( $webPageText, "Game Info") + ) { $startGameInfoTbl = rindex($webPageText, "<table class +=", $startGameInfo ); $endGameInfoTbl = index ( $webPageText, "</table>", + $startGameInfo ); $gameInfoTbl = substr($webPageText, $startGameIn +foTbl, $endGameInfoTbl - $startGameInfoTbl +9); print $gameInfoTbl; my $converted = HTML::FormatText::WithLinks::AndTable +s->convert( $gameInfoTbl ); my @lines = split /\n+/, $converted; my $arraySize = @lines; print "\narray size = $arraySize\n"; } } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: problem HTML::FormatText::WithLinks::AndTables
by tobyink (Canon) on Mar 10, 2013 at 06:57 UTC | |
by kevind0718 (Scribe) on Mar 10, 2013 at 17:11 UTC | |
by tobyink (Canon) on Mar 10, 2013 at 22:08 UTC | |
by kevind0718 (Scribe) on Mar 12, 2013 at 00:48 UTC | |
by tobyink (Canon) on Mar 12, 2013 at 08:02 UTC |