Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have simple q for you. How do u use HTML::PARSER to extract some information. I have made a search using WWW::Search and received a list of URLs.Meaning, the following are extraction from my codes:
my $search = new WWW::Search ('AltaVista'); $search->maximum_to_retrieve(10); $search->native_query (WWW::Search::escape_query($word)); while ( $results = $search->next_result()) { $n++; print $q->a({href=>$results->url}, $results->url);
How do I use HTML::Parser to visit each urls.I need to extract some lines in each pages. Pls shed some light on this since I really can't figure out this. Thanks for your kind attention p/s: Pls don't ask me to visit the perldoc HTML::Parser or HTML::TreeBuider. I have done that and but I can't understand how to link the results to the HTML::Parser. Rgds,

Edit kudra, 2001-10-22 Changed title so it doesn't match module name

Replies are listed 'Best First'.
Re: HTML::Parser
by toma (Vicar) on Oct 22, 2001 at 11:20 UTC
    Rather than using HTML::Parser, you might have better luck with HTML::LinkExtor. The tomacorp LWP Quick Reference Guide includes an example program that prints out the links on a page, expanding relative links to make them absolute.

    It should work perfectly the first time! - toma

Re: HTML::Parser
by Anonymous Monk on Oct 22, 2001 at 11:53 UTC
    Based on the q above, URLs are received from the codes above. I wanted to visit each URLs to extract some contents (contents like description, text) from each page.How to go about it. HTML ::Link Extractor is used to extract links, but not the contents.
      Have you read the pod for HTML::Parser (I know you say you did, but still, you seem to be missing something..)?

      HTML::Parser "parses" html, it does not "visit" webpages, for that, you need to look into LWP like the guy above hinted (although it's true he kinda missed the q).

      You can use LWP::UserAgent, or LWP::Simple, there's plenty of examples in the pod, and on this site... look'2'see...

      I'm curious though, how did you get the impression that HTML::Parser would "visit" webpages?

      update: oh, by the way, I don't consider the name of a module a good "title" for a post (it should say "subject" instead of title, but anyway)

      "You shook me baby, and then I tripped..."
      ___crazyinsomniac_______________________________________
      Disclaimer: Don't blame. It came from inside the void

      perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

      To get the content of the URLs that you find, use the LWP module. It has a routine that takes the URL as an argument. This routine returns the contents of the URL as a string.

      For a complete LWP example read the pod on LWP or see the "Simple LWP Example" section of the LWP Quick Reference Guide.

      It should work perfectly the first time! - toma