in reply to Re: Framework for News Articles
in thread Framework for News Articles
My problem is I want data. Not *pretty web pages*. Raw data in feed format that I can process. I'm pretty much getting the results you are looking for now but not beating my head around having to parse html with all it's problems: namely you open to the mercy of web designers whim to change the layout.
use rdf, rss or pda feedsSo I avoid HTML. I'm lazy. I look for the rss, rdf, pda html pages. Point my spider and dump them in a directory for later parsing. Most news sites have rss feeds (though my local newspaper, The Age supplies rss feeds for a fee. but produces a lite page for pda's.) so some parsing is necessary.
Now suppose I want to parse a page (in Perl) why wouldn't I use Andy Lesters fine WWW::Mechanise? (WWW::Mechanise article).
questions, questions, devils advocate
I'm not actually knocking the idea.
now you may say, goon your an idiot, be quiet. but ...
|
|---|