in reply to Convert strings with unknown encodings to html

the strings are in various formats
Do you know all possible formats that the strings might be in, and if yes, what are they?
The strings are mostly ascii, but some of them have special characters
Please don't call them 'special characters': the characters are completely normal, it's your database that is 'special'.
  • Comment on Re: Convert strings with unknown encodings to html

Replies are listed 'Best First'.
Re^2: Convert strings with unknown encodings to html
by Pascal666 (Scribe) on Jul 01, 2015 at 01:32 UTC

    I included examples of each in the above test program. Note that some of the examples are multiple bytes (#1 below, for example, is two characters, one of three bytes and one of two). Best I can tell, the formats are:

    1. UTF-8: chr(226).chr(152).chr(134), chr(195).chr(161) 2. CP1252: chr(150), chr(153) 3. HTML: '®', 'Æ' 4. ASCII: '&' 5. Unicode codepoints: chr(63743), chr(991), chr(9760));

    Obviously the database is a bit 'special'. Unfortunately it is provided by a 3rd party, a very large company, and I have no control over their input sanitization.

      Obviously the database is a bit 'special'. Unfortunately it is provided by a 3rd party, a very large company, and I have no control over their input sanitization.

      :) complain