in reply to XML::LibXML document serialized with diacritics as unicode entities
Does this method have any danger or drawback? ... I would find it very useful since the serialized text with entities is imune against any encoding changes when stored to database (Oracle).
Except iso-8859-1 is NOT immume to encoding changes: characters 127 - 255 have the same code points as unicode but not the same encoding in any unicode encoding. They also can't be converted to 7-bit ascii.
7-bit ascii might be a bit safer, but as the documentation for setEncoding notes:
Note that this function has to be used very carefully, since you can’t simply convert one encoding in any other, since some (or even all) characters may not exist in the new encoding. XML::LibXML will not test if the operation is allowed or possible for the given document. The only switching assured to work is to UTF8.
Also note that storing full unicode text as numeric entities is pretty inefficient. If your database and driver support it, using one of the native unicode encodings is probably better.
|
|---|