mr_market has asked for the wisdom of the Perl Monks concerning the following question:

Hi Newcomer, coded in Perl for 10 years professionally, been to 5+ YAPCs in that time, not coded for last 5 years. Something got my goat up ... got fed up of state sponsored website banning comments they deemed un PC. (BBC). Not into racism or offensive language. Should be allowed to present a well reasoned comment on BBC HYS. Would like to ping the BBC HYS (can hopefully be adapted to any moderated comment site) for post-posted comments being taken down. I comment on economics and get censored if it doesn't fit the political narrative of the day. I realise that they won't like spamming/pinging every second, and also can't ping for what they cut off at source. Basically want some hints to point me in the direction of finding out what comments the BBC censors and page it in real time and let the public decide whether the BBC is impartial or not. Been out of the game since 2008 so probably got a lot to re-learn. Don't need the basics, just a sketch so as not to re invent the wheel. Many thanks PS, I asked the same question on Dev Shed and they pointed me to Perl Monks so I hope you guys can help. It's a freedom of speech thing, I'll write it myself and make it available for free, I just want to know where to start.

Replies are listed 'Best First'.
Re: Web Have Your Say diff log
by Your Mother (Archbishop) on Jan 23, 2015 at 03:42 UTC

    You should include a link to one of the BBC forum/comment pages for anyone to peruse should they want to help you. You should first check the site’s Terms of Service. Automated queries are proscribed on most big sites and most of the better hackers here won’t help break such.

    Many comment boards on news sites do Ajax updating now. If this is one of those cases then WWW::Selenium or WWW::Mechanize::Firefox (as suggested already) would probably do it. Plain WWW::Mechanize might work too but would be much more intrusive on the server and more obvious. I know Selenium has watcher/event functions/JS access so you can wait for comments to appear and then deal with them.

    You will find bias. It’s not even a question, is it? And therein lies the real rub. You’ll find it and present it and those who are biased won’t see it and everyone else (the minority that can divorce preference/proclivity/upbringing from right/wrong/logic) will say… well, DER–P! The insanity of the whole thing boggles, really. But it is a marketplace, mr_market, in’it? I think this is a fun project and wish you luck and will even help you if the site’s ToS allow and you post follow-ups here but maybe starting your own political/news/whatever forum might be a better use of your frustration/time/potentially-righteous-anger.

    (Update: added possessive s.)

      Thanks for the info, much appreciated on the technical side. I am a-political, and as such I simply want to publish the diffs between a supposedly independent (BBC) comment section so that others can judge for themselves whether there is or is not a bias. I have detected a bias, but the facts will speak for themselves regardless of my opinions. Transparency is what we are looking for. Thanks again, and all the best.
Re: Web Have Your Say diff log
by Anonymous Monk on Jan 23, 2015 at 02:17 UTC

    As you've already said, they probably won't like a flood of automated requests, so it's best to see if the site has some terms of service that you should follow so that they don't block you or worse.

    Keeping that in mind, it sounds like WWW::Mechanize might be able to help you - or, for a fixed URL, plain old LWP::Simple should work, since it sounds like basically what you want to do is refresh the page multiple times and then compare the versions. Once you've got the HTML, something like HTML::FormatText could give you the text-only version, which could be diffed using something like Text::Diff (disclaimer: I haven't used these latter two modules, just found them via a quick search). Or, you could get the structure of the HTML via something like HTML::TreeBuilder and implement the extraction and comparison yourself.

      If the website uses JavaScript / AJAX / etc., things can get more complicated, but maybe WWW::Mechanize::Firefox could help there.

      Many thanks, I remember Mechanise, I'll see what I can do. Much appreciated pointer in the right direction. I miss the open source community. Altruism at it's best. Do you think if I can make a script/lib/class for such that others would be interested from a re-using point of view? With limited time what I produce might not be applicable to all hosts, might be some hard coding, but happy if anyone wants to generalise it.