in reply to using HTML::TreeBuilder effectively

your title is misleading

  • Comment on Re: using HTML::TreeBuilder effectively

Replies are listed 'Best First'.
Re^2: using HTML::TreeBuilder effectively
by skaryzgik (Novice) on Sep 16, 2015 at 20:57 UTC
    It seems to me that someone who doesn't understand the error message could easily not realize the problem isn't with the usage of HTML::TreeBuilder.

    If the current title is misleading, is there another that might be better?

      It seems to me that someone who doesn't understand the error message could easily not realize the problem isn't with the usage of HTML::TreeBuilder.

      If the current title is misleading, is there another that might be better?

      Maybe "How I spent my summer vacation?"

      Maybe the error message ie "Couldn't get http://dailynews.yahoo.com/h/tc/: 500 Can't connect to dailynews.yahoo.com:80 (Bad hostname)?"

      Sure, its possible the OP doesn't understand the message ... but OP seems to have done fine for title in getting content of an https website and Using example script correctly for opening cpan module but not creating a useful browser from automation

Re^2: using HTML::TreeBuilder effectively
by Aldebaran (Curate) on Sep 17, 2015 at 07:21 UTC

    I have to admit that I'm curious how you think I should have typed the subject for this thread. It also seems to be the case that the script in the original post has outlived its assumptions for how it gives useful output. It has interesting syntax, and I'd like to be able to say that I had that part mastered by now, but I do not.

    My Q2 may have been pinned on that script, but I'd prefer not to speak about it again until we obtain output as described in the subject of the original post.

    How does one make yahoo able to find its own news?

    C:\cygwin64\home\Fred\pages2\hunt>perl lib6.pl GET https://search.yahoo.com/search [s] p= (text) <NONAME>=Search (submit) fr=sfp (hidden readonly) fr2= (hidden readonly) iscqry= (hidden readonly) search string is Yahoo News C:\cygwin64\home\Fred\pages2\hunt>type lib6.pl #! /usr/bin/perl use warnings; use strict; use 5.01; # create a new browser use WWW::Mechanize; my $browser = WWW::Mechanize->new(); # tell it to get the main page $browser->get('https://search.yahoo.com/'); # make sure $link is defined if ( defined $browser ) { $browser->dump_forms; my $brand = 'Yahoo'; my $collection = 'News'; my $search_string = "$brand $collection"; say "search string is $search_string"; my $url = $browser->uri; system( 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe +', $url ); } else { use 5.01; $browser->back; say "tja"; my $url = $browser->uri; system( 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe +', $url ); }

    Where I'd rather focus is on a generic way to specify searches, which are the trade of the website we're talking to. They announce themselves as 'Search' in this example, but I would not want to be wed to the notion that it had to be upper case, for example. Here is where I think the output is useful:

      <NONAME>=Search                (submit)

    So I would like to populate the search string, submit, and then follow the first link suggested.

    Thank you for your comment,

      Re: your OP this works in the UK, you may have to amend for your location

      #!perl use strict; use HTML::TreeBuilder 2.97; use LWP::UserAgent; sub get_headlines { my $url = $_[0] || die "What URL?"; my $response = LWP::UserAgent->new->request( HTTP::Request->new( GET => $url ) ); unless($response->is_success) { warn "Couldn't get $url: ", $response->status_line, "\n"; return; } my $tree = HTML::TreeBuilder->new(); $tree->parse($response->content); $tree->eof; my @out; foreach my $link ( $tree->look_down( # ! '_tag', 'a', sub { return 1 if $_[0]->attr('class') =~ /title/; # my @c = $_[0]->content_list; # @c == 1 and ref $c[0] and $c[0]->tag eq 'b'; } ) ) { push @out, [ $link->attr('href'), $link->as_text, ]; } warn "Odd, fewer than 6 stories in $url!" if @out < 6; $tree->delete; return @out; } #science health world entertainment open OUT,'>:utf8','yahoo.txt' or die "$!"; foreach my $section (qw[tech science health world entertainment]) { my @links = get_headlines( "https://uk.news.yahoo.com/$section/" ); print OUT $section, ": ", scalar(@links), " stories\n", map((" ", $_->[1], "\n"), @links),"\n"; }
      poj