in reply to Re: Help needed to compare two unicode strings!!!
in thread Help needed to compare two unicode strings!!!

Thanks for your reply.
Actually, I want to make unicode string comparasion robust. So I need to compare both Canonical equivalance as well as Compatibility equivalance. I can mention one example where these two equivalance makes difference.
The half-width and full-width katakana characters have same compatibility equivalents, but they are not canonical equivalent.
So, is it fine to compare canonical equivalance first and then compatibility equivalance?
TIA
-Pijush
  • Comment on Re^2: Help needed to compare two unicode strings!!!

Replies are listed 'Best First'.
Re^3: Help needed to compare two unicode strings!!!
by pg (Canon) on Aug 31, 2004 at 01:41 UTC
    "So, is it fine to compare canonical equivalance first and then compatibility equivalance?"

    This is all about purpose. What is your purpose? You said that you wanted to make the comparason robust, but what is a "robust comparason" (this is not a concept defined in unicode standards, but rather a term you created to serve your own thought, which was not clearly expressed)

    In general, the canonical equivalancy is the basic equivalancy, and is most likely good enough for you.

    To compare both equivalancy really does not make the comparason robust. To me robust means not exposed to error or exposed to less errors, which does not make much sense here (both equivalancy has their own purpose, and none of them produces ERROR). Say it one more time, it is about your purpose, about the kind of equivalancy you want.

      Thanks for your opinion.
      I agree with you that canonical equivalancy is the basic equivalancy and most likely suitable for application. But if I want to build an application which will verify the user credentials against the credentials stroed in a directory server (say LDAP), in that case I need to compare two strings, one supplied by the user and one stored in the directory. The sting stored in the directory server supplied by an administrator of the application software. In case of string stored in the directory server is the UTF-8 string and the UTF-8 encoding has done on the string supplied by the administrator. This string stored in the directory server remains constant for every time I fetch from directory server. But credentials supplied by the user may vary. Some time user can choose one format and other time another format. Take the Katakana example which I have mentioned in my previous post.
      Administrator supplied the user credentials in katakana half width format and user supplied the same thing in katakana full width format.
      If I comapre these two strings in canonical equivalancy then these two strings are different and the application will fail to identify the user. I think this is an error. Can you please tell me in this case what shall I do, stick to canonical equivalancy or shall I check compatibility equivalancy as well?
      TIA.
      -Pijush