Hi there,
I`ve been pondering over a strange problem quite a while now.
Here is the situation. I have a MSSQLserver table with a nvarchar column containing strings in all kinds of languages. As an example, there are two rows I query, one containing an i with a ROOF ontop and one wih the same character followed by something unmistakably >8bit ascii, an A with a reverse ROOF. When I query these two lines and show them in a HTML page, the ROOFed I is shown correctly in the first line but not in second.
That is, if the encoding in the browser (both IE and firefox) is set to UTF-8; if it is set to Western European, than it is the other way around: the Roofed I shows, and the Unicode character is printed "wide" in two ascii-chars.
It seems that perl reads the Roofed-I differently from the db (ODBC driver) or write OUT, in the case of there being a two-byte character behind it or not. It's not the HTML, because the same query in a cmd-box shows the byte-count difference as well. The first ROOFED-I is represented by C4 83, the second by EE.
It's even more weird if you see that both rows from the database return true on the regex m/ROOFED-I/... and even: ord() of the first character is in both cases is 238, the roofed-I in ASCII form as it were.
In practice this means that Strings containing both 8-byte diacritical characters and >8 bits unicode characters could not be displayed in HTML. I'm sure I am doing somethingh wrong,..... who can help..?
Grtz=JP
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.