Oddly enough, I just put an entry into
Craft called
Use LWP::Simple to download images from a website which shows a (very) simple way of downloading images from a webpage. It could be adapted to get html pages as well.
I use LWP::Simple's
getstore to do this, but if you used
get instead, you could store the contents of the webpage in a scalar for parsing, etc.
For example, you can get the contents of a webpage really easily using LWP::Simple like this, from the command line:
perl -e "use LWP::Simple;$s=get'http://www.yahoo.com');print $s"
With similar code you could then parse through the HTML with regular expressions, etc. Good luck! -timallen