in reply to Idiom guessing script

You may be able to get word lists from Open Office or ispell; I won't vouch for the completeness or accuracy of either.

As noted by albannach & pileofrogs, this is highly non-trivial. I've also been told -- by native speakers of Brazilian Portuguese -- that they could "get along" in Spanish, and by native speakers of (iirc, Puerto Rican) Spanish, that they could "be understood" by native Italian speakers (a confusing concept; Italy's regional dialects are alive, well, and not necessarily mutually comprehensible); all of this would seem to make unambiguous identification of a title as Italian, Spanish, or Portuguese impossible: the languages may well be too similar. Of course, identifying the language, by title, of books like Cervante's Don Quixote, Orwell's 1984 or Burgess's M/F is impossible. And is that copy of Sagan's Bonjour, Tritesse in French or has the translator kept the title in French?

What would you consider adequate reliability?

emc

Replies are listed 'Best First'.
Re^2: Idiom guessing script
by Your Mother (Archbishop) on Nov 21, 2005 at 19:28 UTC

    Not to trivialize it, because it is (difficult|impossible)--I like cog's answer and I'm looking forward to having a reason to try that module--but written language is dramatically more predictable than spoken and there are many frequent and unique points in those languages. Consider-

    due dois dos deux
      Hello folks,

      Thanks a lot for all the inputs.

      I´ve just tryed Lingua::Identify, but it suffers from the same problem as the other module: simply not trustable for small strings. For example, is says "Big Cat" is italian, "Deux chansons" is italian, and "Open bridge" is deutsch. So, you can see how problematic it would be to use it.

      I´ve checked ispell and it seems there are some word lists there maybe I can use. I´ll have to look closer, but at first they look not as extensive as necessary.

      As for the ISBN idea, the problem is that I don´t have the isbns for these books. And regarding fetching other online databases, I don´t think they´ll be trusted to have books in all languages, at least Amazon has just failed this test a few moments ago.

      I think I´ll have to free the beast to crawl out the world. (Wow, chill out, I´m not the messenger of the apocalypse! Just some metaphor! hahahah)

      If you guys think of something, please let me know.

      Take care, fellow monks

      André