in reply to help for naming a module that aims latin utf8 coded corpus statistical analysis

I hope you are not really going to insist on using data file names that include spaces and parentheses ("1 (1).txt" and so on) as either input files or output files. That makes lots of things a lot more difficult for command-line usage involving file names (everything would need to be quoted and/or escaped). Please stick to alphanumerics, underscore, hyphen and period for file names.

I understand the difficulty of trying to write documentation in a foreign language. I hope you will have a chance to go over it with someone who knows both your native language and English well enough so that you can discuss the module comfortably with them, and they can clarify the English description. As it is, I would have to read the program code to understand how to use the module. (You might want to consider posting (a pointer to) the code for preliminary review by other monks.)

It may be worthwhile to create a CPAN layer called "Text::Corpus::", which at first would contain just a "Stats" module (Text::Corpus::Stats), and later could contain other support modules for building, maintaining and using text corpora.

  • Comment on Re: help for naming a module that aims latin utf8 coded corpus statistical analysis

Replies are listed 'Best First'.
Re^2: help for naming a module that aims latin utf8 coded corpus statistical analysis
by fernandes (Monk) on Jul 18, 2007 at 06:02 UTC
    Thank you by your suggestions and comments. Text::Statistics::Latin is published and registred. Other languages (or unicode intervalls) are comming soon. Text::Statistics::Devanagari Text::Statistics::GreekAndCoptic Text::Statistics::Cyrillic and Text::Statistics::Arabic are indexed on CPAN. Enjoy. If you know someone would like to delliver me better english documentation, please stay in touch.