Your question is very confusing.

I need to parse some HTML files and have to write a XML output file. In some cases I get a XML parser error.

The symbol ® ( REGISTERED SIGN ) need to be convert to its equivalent unicode U00AE

Is "®" the result you get after parsing the HTML? i.e. the character and not an ampersand followed by an identifier and then a semi-colon? If so, then the parsing is working fine, and it is the output you need to worry about. (I mention this because PerlMonks takes HTML input, so you might have wanted to say ® rather then ®.)

Any XML library should be outputting something appropriate when given ® as input. Either it will output an entity (which should be absolutely fine, since you should be parsing XML only with an XML parser which can handle such things) or it will output the character in whatever character encoding is being used (so you just need to make sure you are outputting UTF-8 (or whichever unicode encoding you want) — how you do that depends on which XML library you are using.

I don't want Decimal Equivalent or HTML entities

Named HTML entities could screw things up in XML, but the decimal entity should not.

as this XML file should be parsed in JSON.

This doesn't make sense. XML is a data format. JSON is a completely different data format.

You can't parse anything in JSON.

You could store an XML document as a string inside a JSON object, but that shouldn't prevent you from using entities — you would parse the JSON to extract the string of XML, then put that string in an XML parser to extract the data from it.

What is the problem you are really trying to solve? You don't seem to have provided enough detail here.


In reply to Re: Convert HTML symbols to equivalent Unicode by dorward
in thread Convert HTML symbols to equivalent Unicode by jai_dgl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.