damian45 has asked for the wisdom of the Perl Monks concerning the following question:
I'm validating some mixed English and Japanese utf-8 input . It sometimes contains a-z A-Z 0-9 entered not only from the common ascii compatible unicode range, but also this unicode range xFF10 - xFF5E
http://en.wikibooks.org/wiki/Unicode/Character_reference/F000-FFFF
for example
A (unicode x0041)
A (unicode xFF21 http://www.decodeunicode.org/u+FF21)
I understanding that to be safe I need to interpret unicode characters I accept only as their smallest unicode representation
e.g interpret xFF21 as x0041 (as in above)
So question is, can I use some function/module of Perl to do this, or do I have to manually convert them with a mapping. All the experimenting I've done so far, it seems like I'll have to manually do it. This surprise me if I is supposed to interpret them in their smaller representation.
cheers for any feedback, sorry for my english
damian
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: validating unicode chars in their smallest form
by ikegami (Patriarch) on Jun 13, 2010 at 01:43 UTC | |
|
Re: validating unicode chars in their smallest form
by ikegami (Patriarch) on Jun 13, 2010 at 18:23 UTC | |
|
Re: validating unicode chars in their smallest form
by Xilman (Hermit) on Jun 13, 2010 at 10:23 UTC | |
by damian45 (Novice) on Jun 13, 2010 at 18:14 UTC |