in reply to Re: Convert HTML symbols to equivalent Unicode
in thread Convert HTML symbols to equivalent Unicode

Hey
I used HTML::Entities it converts ® symbol to
® which is not parsed in XML.
I need Exactly Unicode equivalent as U00AE.
Is there a way to get ?
  • Comment on Re^2: Convert HTML symbols to equivalent Unicode

Replies are listed 'Best First'.
Re^3: Convert HTML symbols to equivalent Unicode
by ikegami (Patriarch) on Apr 14, 2009 at 13:55 UTC

    I need Exactly Unicode equivalent as U00AE.

    That doesn't make any sense. Please speak in terms of HTML entities, Unicode characters, U+xxxx notation and perhaps UTF-8 encoding.

    • Are you asking how to get character U+00AE from "®"?

      decode_entities will get the character from ®.

    • Are you asking how to get the string "U+00AE" from character U+00AE?

      ord will get 0xAE from the character.

      sprintf can be used to format 0xAE as hex.

    For example,

    >perl -MHTML::Entities=decode_entities -e"printf qq{U+%04X\n}, ord(dec +ode_entities('®'))" U+00AE

    But then again, you also mentioned JSON. Whatever JSON module will handle serializing the character as "\u00AE" or similar from from the character U+00AE, so all you only need is decode_entities.

Re^3: Convert HTML symbols to equivalent Unicode
by Anonymous Monk on Apr 14, 2009 at 10:37 UTC