I did check CPAN, but they only have modules to create PDFs or manipulate them, but not to simply grab the content off the web. To be precise its the content I'm bothered with, I need the text each time, as I am working on information retrieval and parallel texts.
cheers!