http://qs1969.pair.com?node_id=11133798


in reply to How to count the vocabulary of an author?

There are stemming (basically chopping off letters from the end of a word in order to arrive to a basis) and tagging (find out which part of speech a word is, e.g. verb) packages in cpan and specific to different languages. e.g. Lingua::*

Then ask uncle NSA and aunty CIA for the corpus, they keep meticulous records for all major european politicos' conversations.