in reply to Html to text

Web::Scraper

Now, if you were a bit more forecoming with your criteria for "best", you might get an answer more tailored towards your needs.

Replies are listed 'Best First'.
Re^2: Html to text
by Anonymous Monk on Mar 22, 2009 at 12:00 UTC
    OK thanks, I wish to extract all the words (unicode format) as presented to the user, plus the frequecy of these words. The words presented to the user from this page include -
    more useful options PerlMonks Html to text by Anonymous Monk Log in Cr +eate a new user The Monastery Gates Super Search Seekers of Perl Wisd +om Meditations PerlMonks Discussion Snippets Obfuscation Reviews Coo +l Uses For Perl Perl News Tutorials Code Poetry Recent Threads Newes +t Nodes Donate What's New on Mar at perlquestion print replies xml Ne +ed Help Anonymous Monk has asked for the wisdom of the Perl Monks con +cerning the following question What is the best module for extracting + the text that yiu see on a webpage Comment on Html to text Html to t +ext Corion Archbishop on ...

      As you seem to have retrieved the page already, then maybe something like HTML::TokeParser or still Web::Scraper are the tools to use. For the word frequency and stopwords, you will have to program. Try these and come back once you encounter problems.

      Note though that Perlmonks is not a site that should be scraped. If you have a specific need for the content of this site, contact the gods. Other automated mass access to this site is discouraged and we block badly written scripts that put an undue load on the site.