in reply to Scrape a blog: a statistical approach
Ok, I did some further researches and I've found that this stuff is too complicated to be solved "in a few lines of code".
For those who will have to handle the same problem I post the following link which contains up to date informations and useful libraries. Now I using justext within python. There is a NCleaner perl module but I've not been able to use it.
As always, thanks guys for your support.
https://sites.google.com/a/morganclaypool.com/wcc/home/software
|
|---|