Re: Getting the text of the html page

here is one way using WWW:Mechanize:

use strict;
use warnings;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new( autocheck => 1,
                                cookie_jar => {},
                                 );
$mech->get("http://perlmonks.org/?node_id=468232");
print $mech->content( format => "text" );
[download]

That will strip all of the markup and print a text version of the page.

hopefully I understood your question.

-Eric

Comment on Re: Getting the text of the html page Download Code

Replies are listed 'Best First'.
Re^2: Getting the text of the html page by agynr (Acolyte) on Jun 20, 2005 at 07:30 UTC
Hello Eric, While doing with the www.mechanize it is giving the error on the get statement.The error goes like this Can't locate object method "host" via package "URI::Foreign".... From where I could load this package as it is not installed earlier on my system.	[reply]
Re^3: Getting the text of the html page by ank (Scribe) on Jun 20, 2005 at 08:13 UTC
You'll find these references useful: A guide to installing modules and Writing, Installing, and Using Perl Modules also, take a look at CPAN -- ank	[reply]
Re^4: Getting the text of the html page by agynr (Acolyte) on Jun 20, 2005 at 08:27 UTC
Hello All, I have installed the required modules. But what I want is to read the text which is written on the explorer window. I don't want to get the text of the page but the text which one reads on a page. Like what u r reading now is the text and the text obviously should not have the html tags or entities whatever the case may be. I hope that now u have a clear picture of the problem. Thanx	[reply]
Re^5: Getting the text of the html page by ank (Scribe) on Jun 20, 2005 at 08:50 UTC


Perl: the Markov chain saw
	PerlMonks