Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I am trying a quite (for me) difficult task: adapt the module Lingua::EN::Tagger to be used with another language. To do so I need to train the probability values with a corpus in my language. The probability values are saved in several YAML files. Unfortunately there is 0 documentation, as far as I can say, describing how to do this. Actually, I have problems understanding how the probabilities are saved. I have some background in linguistics and in corpus linguistics. However, without documentation it is a hard task for me. I have seen that there is also a German version (Lingua::EN::Tagger) derived from the EN one. So the task, provided a corpus and some manual tagging to train the model, should be doable. I've written to the authors to get some info on how to proceed, but no response. Has somebody already tried to do something like this? If yes, have you found some documentation online on how to train the model? Any suggestion would be very much appreciated. Best.

  • Comment on Lingua::EN::Tagger adapting to other language