Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
A character can consist of one or more bytes. A character can be changed in one of several ways: 1. it can simply be deleted 2. a multi byte character can be mapped to a single byte character 3. a single byte character can be mapped to a multibyte character an example. STRING NORMALIZED_STRING ------ ----------------- ABCÅD ABCD ABCÄD ABCëëD ABCááD ABCèD
What I want to do is to be able to recreate the same kind of normalization myself. Now, of course I could go through the thousands of rows manually and try to find out how it works. But I am hoping it could be possible to let Perl do the analyzing and by comparing the strings come up with some mapping scheme:
'Å' => '', 'Ä' => 'ëë', 'áá' => 'è'
I would be very happy if this was possible!
Thanks in advance for your insights,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Characters in disguise
by GrandFather (Saint) on Jun 01, 2006 at 22:57 UTC | |
by Anonymous Monk on Jun 01, 2006 at 23:24 UTC | |
by GrandFather (Saint) on Jun 01, 2006 at 23:28 UTC | |
|
Re: Characters in disguise
by ruzam (Curate) on Jun 01, 2006 at 22:18 UTC | |
by Anonymous Monk on Jun 01, 2006 at 22:35 UTC | |
|
Re: Characters in disguise (diff)
by tye (Sage) on Jun 01, 2006 at 22:11 UTC | |
|
Re: Characters in disguise
by samtregar (Abbot) on Jun 01, 2006 at 21:43 UTC |