You know the answer, only with statistical certainty and dependent on the length of the text and the distance of languages.
Hand and finger (en) <=> Hand und Finger (de)
If same script lead to same delimiters can only be answered by someone knowing all 6000 languages of the world.
But already Arabic words should be a problem, maybe less if transcribed. Chinese even more.
see also Word_divider and Word#Word_boundaries
Cheers Rolf
( addicted to the Perl Programming Language)
In reply to Re^3: Perl & Unicode: state of the art?
by LanX
in thread Perl & Unicode: state of the art?
by BrowserUk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |