Greetings Monks,

I am trying to scrape the TV schedule from the C-SPAN website at http://inside.c-spanarchives.org:8080/cspan/schedule.csp I am a C-SPAN junkie (which may seem weird to all the CPAN junkies here). Here is what I have been struggling with so far:

#!/usr/bin/perl -w use warnings; use strict; use HTML::TreeBuilder; my $html = <<'EOHTML'; <table cellspacing="0" cellpadding="2" border="0" width="100%"> <tr> <td width="20%" align="right" valign="top" bgcolor="#CCCCCC"> <font face="arial, helvetica" size="2"><b>02:44 AM EDT</b></font><br> <font face="arial, helvetica" size="1">0:42 (est.)</font><br></td> <td valign="top"><font face="arial, helvetica" size= "1">Speech</font><br> <font face="arial, helvetica" size="2"><a href= "/cspan/cspan.csp?command=dprogram&amp;record=142524675">U.S.-Ja +pan Relations</a><br> Asia Society, Washington Center<br></font> <font face= "arial, helvetica" size="2" color="#CC0000">Ryozo Kato</font> <font face="arial, helvetica" size="1">, Japan</font></td> </tr> </table> EOHTML my $tree = HTML::TreeBuilder->new_from_content($html); $tree->parse_content($html); my $c = $tree->look_down( "_tag", "table", "width", "100%" ); my @trimmed_text = map ( ref($c) ? $c->as_trimmed_text : $c, $c->conte +nt_list ); print "@trimmed_text\n"; #
This prints:
02:44 AM EDT0:42 (est.)SpeechU.S.-Japan Relations Asia Society, Washington Center Ryozo Kato , Japan

But I am looking to put the data above into a hash of this form:

%h_cspan = ( time => 02:44 AM EDT length => 0:42 (est.) type => Speech title => U.S.-Japan Relations org => Asia Society, Washington Center );
Is this even possible with the HTML::ELement look_down method? Any help is appreciated.


In reply to Putting HTML::Element content_list into a hash by TsuDohNihm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.