in reply to Convert HTML symbols to equivalent Unicode

The HTML::Entities module might be what you're looking for.
  • Comment on Re: Convert HTML symbols to equivalent Unicode

Replies are listed 'Best First'.
Re^2: Convert HTML symbols to equivalent Unicode
by jai_dgl (Beadle) on Apr 14, 2009 at 10:27 UTC
    Hey
    I used HTML::Entities it converts ® symbol to
    ® which is not parsed in XML.
    I need Exactly Unicode equivalent as U00AE.
    Is there a way to get ?

      I need Exactly Unicode equivalent as U00AE.

      That doesn't make any sense. Please speak in terms of HTML entities, Unicode characters, U+xxxx notation and perhaps UTF-8 encoding.

      • Are you asking how to get character U+00AE from "®"?

        decode_entities will get the character from ®.

      • Are you asking how to get the string "U+00AE" from character U+00AE?

        ord will get 0xAE from the character.

        sprintf can be used to format 0xAE as hex.

      For example,

      >perl -MHTML::Entities=decode_entities -e"printf qq{U+%04X\n}, ord(dec +ode_entities('®'))" U+00AE

      But then again, you also mentioned JSON. Whatever JSON module will handle serializing the character as "\u00AE" or similar from from the character U+00AE, so all you only need is decode_entities.