Does it have to be HTML::TreeBuilder? If not this might be useful.
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use HTML::TokeParser::Simple; my $html = <<'EOHTML'; <table cellspacing="0" cellpadding="2" border="0" width="100%"> <tr> <td width="20%" align="right" valign="top" bgcolor="#CCCCCC"> <font face="arial, helvetica" size="2"><b>02:44 AM EDT</b></font><br> <font face="arial, helvetica" size="1">0:42 (est.)</font><br></td> <td valign="top"><font face="arial, helvetica" size= "1">Speech</font><br> <font face="arial, helvetica" size="2"><a href= "/cspan/cspan.csp?command=dprogram&amp;record=142524675">U.S.-Ja +pan Relations</a><br> Asia Society, Washington Center<br></font> <font face= "arial, helvetica" size="2" color="#CC0000">Ryozo Kato</font> <font face="arial, helvetica" size="1">, Japan</font></td> </tr> </table> EOHTML my $tp = HTML::TokeParser::Simple->new(\$html) or die "Couldn't parse string: $!"; my $start; my @scraped; while (my $t = $tp->get_token) { $start++, next if $t->is_start_tag('table'); next unless $start; my $text = $tp->get_trimmed_text('br'); push @scraped, $text; } my @keys = qw(time length type title org1 org2); my %h_cpan = map {$keys[$_] => $scraped[$_]} (0..$#keys); #for (0..$#keys){ # $h_cpan{$keys[$_]} = $scraped[$_]; #} print Dumper \%h_cpan;
Output:
---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" _new.pl $VAR1 = { 'org2' => 'Ryozo Kato , Japan', 'length' => '0:42 (est.)', 'time' => '02:44 AM EDT', 'org1' => 'Asia Society, Washington Center', 'title' => 'U.S.-Japan Relations', 'type' => 'Speech' }; > Terminated with exit code 0.
Couple of points.
The org key is in two parts. Also there is probably a more elegant way of loading the array into the hash.

Hope that helps.

Update: Changed the for loop to map.


In reply to Re: Putting HTML::Element content_list into a hash by wfsp
in thread Putting HTML::Element content_list into a hash by TsuDohNihm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.