in reply to Scrape a blog: a statistical approach

Ok, I did some further researches and I've found that this stuff is too complicated to be solved "in a few lines of code".

For those who will have to handle the same problem I post the following link which contains up to date informations and useful libraries. Now I using justext within python. There is a NCleaner perl module but I've not been able to use it.

As always, thanks guys for your support.

https://sites.google.com/a/morganclaypool.com/wcc/home/software

  • Comment on Re: Scrape a blog: a statistical approach