Hello Monks,

I'm looking to replicate the example that cpan gave me for HTML::TreeBuilder and fall short. I took a look at the yahoo site which I still use to get news and have an internet identity with, one that is useful, so I don't feel like I ever want yahoo to disappear.

Q1) The first thing I ask for is a diagnosis for the errors I post after its source. https://metacpan.org/pod/HTML::Tree::Scanning

use strict; use HTML::TreeBuilder 2.97; use LWP::UserAgent; sub get_headlines { my $url = $_[0] || die "What URL?"; my $response = LWP::UserAgent->new->request( HTTP::Request->new( GET => $url ) ); unless($response->is_success) { warn "Couldn't get $url: ", $response->status_line, "\n"; return; } my $tree = HTML::TreeBuilder->new(); $tree->parse($response->content); $tree->eof; my @out; foreach my $link ( $tree->look_down( # ! '_tag', 'a', sub { return unless $_[0]->attr('href'); my @c = $_[0]->content_list; @c == 1 and ref $c[0] and $c[0]->tag eq 'b'; } ) ) { push @out, [ $link->attr('href'), $link->as_text ]; } warn "Odd, fewer than 6 stories in $url!" if @out < 6; $tree->delete; return @out; } foreach my $section (qw[tc sc hl wl en]) { my @links = get_headlines( "http://dailynews.yahoo.com/h/$section/" ); print $section, ": ", scalar(@links), " stories\n", map((" ", $_->[0], " : ", $_->[1], "\n"), @links), "\n"; }

The terminal looks like it's looking for urls that no longer exist:

C:\cygwin64\home\Fred\pages2\hunt>perl lib2.pl Couldn't get http://dailynews.yahoo.com/h/tc/: 500 Can't connect to da +ilynews.ya hoo.com:80 (Bad hostname) tc: 0 stories Couldn't get http://dailynews.yahoo.com/h/sc/: 500 Can't connect to da +ilynews.ya hoo.com:80 (Bad hostname) sc: 0 stories Couldn't get http://dailynews.yahoo.com/h/hl/: 500 Can't connect to da +ilynews.ya hoo.com:80 (Bad hostname) hl: 0 stories Couldn't get http://dailynews.yahoo.com/h/wl/: 500 Can't connect to da +ilynews.ya hoo.com:80 (Bad hostname) wl: 0 stories Couldn't get http://dailynews.yahoo.com/h/en/: 500 Can't connect to da +ilynews.ya hoo.com:80 (Bad hostname) en: 0 stories

Q3) My next question goes to syntax. What is this creature: $_->[0]

Q4) What is a clean, contemporary update for this example?

Thank you for your comment,


In reply to using HTML::TreeBuilder effectively by Aldebaran

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.