in reply to UTF-8 Decoding, Wide Characters, and XML::Twig
I know unicode defines several different kinds of spaces (non-breaking or breaking, zero-width, half-width, em- and en- spaces just to name some off the top of my head). It's entirely possible that the utf8 2 ascii translation misses one of these.
Depending on the input, you might try a simple regexp like s/\s/ /g before translating to ascii, although I don't know exactly which unicode whitespace characters are defined within \s.
_______________________
1 I'm not at all familiar with Japanese. Just mentioned it for illustrative purposes. The Text::Unidecode pod has more detailed (and more accurate!) examples.
|
|---|