in reply to Getting the text of the html page

here is one way using WWW:Mechanize:

use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1, cookie_jar => {}, ); $mech->get("http://perlmonks.org/?node_id=468232"); print $mech->content( format => "text" );

That will strip all of the markup and print a text version of the page.

hopefully I understood your question.

-Eric

Replies are listed 'Best First'.
Re^2: Getting the text of the html page
by agynr (Acolyte) on Jun 20, 2005 at 07:30 UTC
    Hello Eric, While doing with the www.mechanize it is giving the error on the get statement.The error goes like this Can't locate object method "host" via package "URI::Foreign".... From where I could load this package as it is not installed earlier on my system.
        Hello All, I have installed the required modules. But what I want is to read the text which is written on the explorer window. I don't want to get the text of the page but the text which one reads on a page. Like what u r reading now is the text and the text obviously should not have the html tags or entities whatever the case may be. I hope that now u have a clear picture of the problem. Thanx