TsuDohNihm has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to scrape the TV schedule from the C-SPAN website at http://inside.c-spanarchives.org:8080/cspan/schedule.csp I am a C-SPAN junkie (which may seem weird to all the CPAN junkies here). Here is what I have been struggling with so far:
This prints:#!/usr/bin/perl -w use warnings; use strict; use HTML::TreeBuilder; my $html = <<'EOHTML'; <table cellspacing="0" cellpadding="2" border="0" width="100%"> <tr> <td width="20%" align="right" valign="top" bgcolor="#CCCCCC"> <font face="arial, helvetica" size="2"><b>02:44 AM EDT</b></font><br> <font face="arial, helvetica" size="1">0:42 (est.)</font><br></td> <td valign="top"><font face="arial, helvetica" size= "1">Speech</font><br> <font face="arial, helvetica" size="2"><a href= "/cspan/cspan.csp?command=dprogram&record=142524675">U.S.-Ja +pan Relations</a><br> Asia Society, Washington Center<br></font> <font face= "arial, helvetica" size="2" color="#CC0000">Ryozo Kato</font> <font face="arial, helvetica" size="1">, Japan</font></td> </tr> </table> EOHTML my $tree = HTML::TreeBuilder->new_from_content($html); $tree->parse_content($html); my $c = $tree->look_down( "_tag", "table", "width", "100%" ); my @trimmed_text = map ( ref($c) ? $c->as_trimmed_text : $c, $c->conte +nt_list ); print "@trimmed_text\n"; #
But I am looking to put the data above into a hash of this form:
Is this even possible with the HTML::ELement look_down method? Any help is appreciated.%h_cspan = ( time => 02:44 AM EDT length => 0:42 (est.) type => Speech title => U.S.-Japan Relations org => Asia Society, Washington Center );
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Putting HTML::Element content_list into a hash
by wfsp (Abbot) on Apr 22, 2006 at 09:32 UTC | |
|
Re: Putting HTML::Element content_list into a hash
by bobf (Monsignor) on Apr 22, 2006 at 21:26 UTC |