Language translator and dictionaries

vit has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Language translator and dictionaries by Anonymous Monk on Mar 19, 2011 at 02:41 UTC
WWW::Translate::Apertium is a client for http://www.apertium.org/, which is open-source	[reply]
Re^2: Language translator and dictionaries by vit (Friar) on Mar 19, 2011 at 14:02 UTC
In this case I still call an external server, but I'd like to do it on my server.	[reply]
Re^3: Language translator and dictionaries by Anonymous Monk on Mar 20, 2011 at 00:44 UTC
So install http://www.apertium.org/ on your own server	[reply]
Re: Language translator and dictionaries by Khen1950fx (Canon) on Mar 19, 2011 at 14:55 UTC
But it does work for Babelfish. Using Lingua::Translate: `#!/usr/bin/perl use strict; use warnings; use Lingua::Translate; my $x18r = Lingua::Translate->new(src => "en", dest => "de") or die "No translation server available for en -> de\n"; my $english = "perlmonks is the best site for Perl!" my $german = $x18r->translate($english); print $german, "\n";` [download]	[reply] [d/l]
Re^2: Language translator and dictionaries by vit (Friar) on Mar 19, 2011 at 16:34 UTC
The problem is that on my local server I am with ActiveState perl which does not install "Lingua::Translate". If you know by chance how to do that other than `ppm install Lingua::Translate` [download] please let me know. Another question is how reliable is to use a third party server to do a very big amount of translation queries?	[reply] [d/l]
Re^3: Language translator and dictionaries by planetscape (Chancellor) on Mar 21, 2011 at 04:52 UTC
Yes, even you can use CPAN HTH, planetscape	[reply]
Re^3: Language translator and dictionaries by Anonymous Monk on Mar 20, 2011 at 00:45 UTC
A Guide To Installing Modules	[reply]
Re: Language translator and dictionaries by elef (Friar) on Mar 19, 2011 at 16:26 UTC
It's pretty easy to get dictionaries for these languages online, so if you want to build a perl app that "translates" word by word, that would be reasonably easy to do. If you attempt to do this, the quality of the output will be very poor. It should work passably well if the input data is word lists (especially if they cover one given area and you can find good glossaries for that area) but if your input data is running text of any complexity, the "translation" will be barely intelligible at best. Some sort of MT (machine translation) engine would do a much better job, and I'm sure there are some rudimentary open source solutions out there, but I don't know about the specifics. If you have money to spend, there are more than a few companies happy to sell you a complete system. How to get dictionaries: I'm basing this on the assumption that you'll need to handle EU languages only. - Get multilingual dictionaries/word lists. These will cover the most basic words. - Extract term pairs from Wikipedia dumps. The xml dumps contain links to the same article in other languages, so you can extract well north of 100,000 term pairs for major combinations like French-English. I can give you a brief howto if needed. - Same with Wiktionary. - Grab EU term lists such as Eurovoc and the CPV. They contain thousands of terms in all the EU languages. - If you need multilingual text corpora for training a MT system, grab the DGT-TM and the europarl corpus. These contain about a million sentences each of sentence-aligned text in EU languages. You can also add UN texts, but the UN only has 6 official languages. If you go the dictionary route, you should probably run the text through a stemmer. Open source toolchain for tasks like this: http://mokk.bme.hu/resources/ Hunalign (see link) builds crude dictionaries as a by-product of text alignment, so in principle you could let it loose on the europarl and DGT-TM texts and have it produce dictionaries to complement whatever you find online. It may not be good enough quality to be worth your time, though.	[reply]
Re^2: Language translator and dictionaries by vit (Friar) on Mar 19, 2011 at 16:48 UTC
Thanks a lot, Elef, for your detailed response. I will explain you my purpose so that you might have an idea what could be the best for me. I know a little about machine translation stuff but it's not what I need for my project. I am working on the text classification which is using a bag-of-words approach. So my translation task is very easy. Just word-by-word since any text will be split into words anyway. And all EU languages into English only. Which also simplifies the task. I think using a third party translation servers is an overhead to me and it is not very reliable especially when it comes to massive queries. So my guess is that I should use some simple word-to-word translation procedure on my server.	[reply]
Re^3: Language translator and dictionaries by elef (Friar) on Mar 19, 2011 at 17:36 UTC
word-by-word since any text will be split into words anyway. And all EU languages into English only Well, then you should probably start building your multilingual dictionary. I don't think Google Translate would be too happy about you making thousands of automated single-word queries. Eurovoc: http://eurovoc.europa.eu/drupal/?q=download/list_pt&cl=en CPV: http://simap.europa.eu/codes-and-nomenclatures/codes-cpv/codes-cpv_en.htm Other EU term lists: http://ec.europa.eu/eurostat/ramon/nomenclatures/index.cfm?TargetUrl=LST_NOM&StrGroupCode=CLASSIFIC&StrLanguageCode=EN Add whatever you can extract from Wikipedia and Wiktionary dumps and you should be set.	[reply]