Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Getting the text of the html page

by ercparker (Hermit)
on Jun 20, 2005 at 06:57 UTC ( [id://468236]=note: print w/replies, xml ) Need Help??


in reply to Getting the text of the html page

here is one way using WWW:Mechanize:

use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1, cookie_jar => {}, ); $mech->get("http://perlmonks.org/?node_id=468232"); print $mech->content( format => "text" );

That will strip all of the markup and print a text version of the page.

hopefully I understood your question.

-Eric

Replies are listed 'Best First'.
Re^2: Getting the text of the html page
by agynr (Acolyte) on Jun 20, 2005 at 07:30 UTC
    Hello Eric, While doing with the www.mechanize it is giving the error on the get statement.The error goes like this Can't locate object method "host" via package "URI::Foreign".... From where I could load this package as it is not installed earlier on my system.
        Hello All, I have installed the required modules. But what I want is to read the text which is written on the explorer window. I don't want to get the text of the page but the text which one reads on a page. Like what u r reading now is the text and the text obviously should not have the html tags or entities whatever the case may be. I hope that now u have a clear picture of the problem. Thanx

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://468236]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2024-04-25 11:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found