in reply to Re: Perl & Unicode: state of the art?
in thread Perl & Unicode: state of the art?
Thai and Lao text ... these languages, sentences are generally delimited by whitespace, and individual words are not delimited at all in the text, but instead are delimited by syntactic rules.
So, fair to say that the first requirement to process Unicode 'text'; is to determine the language.
So then the question becomes: given a file of Unicode text; can the language be determined?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Perl & Unicode: state of the art?
by LanX (Saint) on Oct 08, 2013 at 00:45 UTC | |
by BrowserUk (Patriarch) on Oct 08, 2013 at 02:16 UTC | |
by Discipulus (Canon) on Oct 08, 2013 at 07:32 UTC | |
|
Re^3: Perl & Unicode: state of the art?
by DrHyde (Prior) on Oct 08, 2013 at 10:35 UTC |