There are stemming (basically chopping off letters from the end of a word in order to arrive to a basis) and tagging (find out which part of speech a word is, e.g. verb) packages in cpan and specific to different languages. e.g. Lingua::*
Then ask uncle NSA and aunty CIA for the corpus, they keep meticulous records for all major european politicos' conversations.
In reply to Re: How to count the vocabulary of an author?
by bliako
in thread How to count the vocabulary of an author?
by karlgoethebier
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |