using HTML::Parser to extract information

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have simple q for you. How do u use HTML::PARSER to extract some information. I have made a search using WWW::Search and received a list of URLs.Meaning, the following are extraction from my codes:

     my $search = new WWW::Search ('AltaVista');
     $search->maximum_to_retrieve(10);
     $search->native_query
(WWW::Search::escape_query($word));

while ( $results = $search->next_result()) {
    $n++;
         print $q->a({href=>$results->url},
$results->url);
[download]

How do I use HTML::Parser to visit each urls.I need to extract some lines in each pages. Pls shed some light on this since I really can't figure out this. Thanks for your kind attention p/s: Pls don't ask me to visit the perldoc HTML::Parser or HTML::TreeBuider. I have done that and but I can't understand how to link the results to the HTML::Parser. Rgds,

Edit kudra, 2001-10-22 Changed title so it doesn't match module name

Comment on using HTML::Parser to extract information Download Code

Replies are listed 'Best First'.
Re: HTML::Parser by toma (Vicar) on Oct 22, 2001 at 11:20 UTC
Rather than using HTML::Parser, you might have better luck with HTML::LinkExtor. The tomacorp LWP Quick Reference Guide includes an example program that prints out the links on a page, expanding relative links to make them absolute. It should work perfectly the first time! - toma	[reply]
Re: HTML::Parser by Anonymous Monk on Oct 22, 2001 at 11:53 UTC
Based on the q above, URLs are received from the codes above. I wanted to visit each URLs to extract some contents (contents like description, text) from each page.How to go about it. HTML ::Link Extractor is used to extract links, but not the contents.	[reply]
(crazyinsomniac) Re^2: HTML::Parser by crazyinsomniac (Prior) on Oct 22, 2001 at 13:28 UTC
Have you read the pod for HTML::Parser (I know you say you did, but still, you seem to be missing something..)? HTML::Parser "parses" html, it does not "visit" webpages, for that, you need to look into LWP like the guy above hinted (although it's true he kinda missed the q). You can use LWP::UserAgent, or LWP::Simple, there's plenty of examples in the pod, and on this site... look'2'see... I'm curious though, how did you get the impression that HTML::Parser would "visit" webpages? update: oh, by the way, I don't consider the name of a module a good "title" for a post (it should say "subject" instead of title, but anyway) "You shook me baby, and then I tripped..." ___crazyinsomniac_______________________________________ `Disclaimer: Don't blame. It came from inside the void` `perl -e "$q=$_;map({chr unpack qq;H;,$_}split(q;;,qH*));print;$q/$q;"`	[reply]
Re: Re: HTML::Parser by toma (Vicar) on Oct 22, 2001 at 19:51 UTC
To get the content of the URLs that you find, use the LWP module. It has a routine that takes the URL as an argument. This routine returns the contents of the URL as a string. For a complete LWP example read the pod on LWP or see the "Simple LWP Example" section of the LWP Quick Reference Guide. It should work perfectly the first time! - toma	[reply]