in reply to using HTML::Parser to extract information

Based on the q above, URLs are received from the codes above. I wanted to visit each URLs to extract some contents (contents like description, text) from each page.How to go about it. HTML ::Link Extractor is used to extract links, but not the contents.

Replies are listed 'Best First'.
(crazyinsomniac) Re^2: HTML::Parser
by crazyinsomniac (Prior) on Oct 22, 2001 at 13:28 UTC
    Have you read the pod for HTML::Parser (I know you say you did, but still, you seem to be missing something..)?

    HTML::Parser "parses" html, it does not "visit" webpages, for that, you need to look into LWP like the guy above hinted (although it's true he kinda missed the q).

    You can use LWP::UserAgent, or LWP::Simple, there's plenty of examples in the pod, and on this site... look'2'see...

    I'm curious though, how did you get the impression that HTML::Parser would "visit" webpages?

    update: oh, by the way, I don't consider the name of a module a good "title" for a post (it should say "subject" instead of title, but anyway)

    "You shook me baby, and then I tripped..."
    ___crazyinsomniac_______________________________________
    Disclaimer: Don't blame. It came from inside the void

    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Re: Re: HTML::Parser
by toma (Vicar) on Oct 22, 2001 at 19:51 UTC
    To get the content of the URLs that you find, use the LWP module. It has a routine that takes the URL as an argument. This routine returns the contents of the URL as a string.

    For a complete LWP example read the pod on LWP or see the "Simple LWP Example" section of the LWP Quick Reference Guide.

    It should work perfectly the first time! - toma