It does not sound like a simple problem, because you are not dealing with much data upon which to base your decision. It strikes me that it may be possible to choose just a few hundred words from each potential language, words that are both commonly used and relatively unique to that tongue. However even this may not work for something like book titles which are not necessarily common usage (in English at least). If you could get large word lists for different languages (perhaps take a sample from some major newspapers?) you could build your own such list of 'indicator words'. I would not keep the langages separate, but have each word in the list tagged as to what language(s) it suggests, then you could sort of take a poll of your title's words to get a guess as to the language used.
On the chance that you are actually talking about book titles, perhaps it would help you to know that the ISBN issued for every book published starts with a code called the Group Identifier. While this is not necessarily a reliable indicator of the language, it may be of some use, perhaps to verify a language-based determination, or to help you select what language(s) to test against.
--
I'd like to be able to assign to an luser
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.