in reply to Re^2: Html to text
in thread Html to text
As you seem to have retrieved the page already, then maybe something like HTML::TokeParser or still Web::Scraper are the tools to use. For the word frequency and stopwords, you will have to program. Try these and come back once you encounter problems.
Note though that Perlmonks is not a site that should be scraped. If you have a specific need for the content of this site, contact the gods. Other automated mass access to this site is discouraged and we block badly written scripts that put an undue load on the site.
|
|---|