in reply to How to count the vocabulary of an author?
There are stemming (basically chopping off letters from the end of a word in order to arrive to a basis) and tagging (find out which part of speech a word is, e.g. verb) packages in cpan and specific to different languages. e.g. Lingua::*
Then ask uncle NSA and aunty CIA for the corpus, they keep meticulous records for all major european politicos' conversations.
In Section
Seekers of Perl Wisdom