in reply to Re: Language translator and dictionaries
in thread Language translator and dictionaries

Thanks a lot, Elef, for your detailed response. I will explain you my purpose so that you might have an idea what could be the best for me.
I know a little about machine translation stuff but it's not what I need for my project. I am working on the text classification which is using a bag-of-words approach. So my translation task is very easy. Just word-by-word since any text will be split into words anyway. And all EU languages into English only. Which also simplifies the task.
I think using a third party translation servers is an overhead to me and it is not very reliable especially when it comes to massive queries. So my guess is that I should use some simple word-to-word translation procedure on my server.
  • Comment on Re^2: Language translator and dictionaries

Replies are listed 'Best First'.
Re^3: Language translator and dictionaries
by elef (Friar) on Mar 19, 2011 at 17:36 UTC
    word-by-word since any text will be split into words anyway. And all EU languages into English only

    Well, then you should probably start building your multilingual dictionary. I don't think Google Translate would be too happy about you making thousands of automated single-word queries.

    Eurovoc: http://eurovoc.europa.eu/drupal/?q=download/list_pt&cl=en
    CPV: http://simap.europa.eu/codes-and-nomenclatures/codes-cpv/codes-cpv_en.htm
    Other EU term lists: http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM&StrGroupCode=CLASSIFIC&StrLanguageCode=EN

    Add whatever you can extract from Wikipedia and Wiktionary dumps and you should be set.