I think you would need to be really perfectly confident about the quality and content of your data in order to be using the "_utf8_on" function the way you do. And in fact, I would almost never be that confident about any data. Stuff coming from a database does not give me much confidence at all.
In any case, I tend to heed the warning in the Encode manual about the _utf8_off/on "internal" functions -- they are not intended to be part of the Encode API, and you shouldn't be using them at all.
It would help a lot if you could provide a data sample, and/or describe the problem in the data in more detail:
- Do any wide (non-ASCII) characters come out correctly at all, or is it rather the case that the "1-2 broken letters per page" just happen to be all of the wide characters in the data?
- When you say you find "\x.." instead of a character, does that really mean exactly two hex digits after the "\x", and do those hex numbers make sense as (Latin1 or other non-unicode) single-byte codepoints for characters that you would expect to see (like é)?
You say you "set utf-8 flag for CDBI data and decode all CGI parameters", but you didn't show the code where you actually try to do this. Based on the code that you have shown so far, I'd say there's some chance that you've got a misunderstanding somewhere. It may be that the database you are fetching from does not really have data in utf8 form, or your output file handle is not set for utf8 discipline, and for one or more reasons, a needed conversion is not really happening.
BTW, when the cgi script sends stuff to the client browser, is the character encoding specified in the http header or in the html, and/or is the browser using the correct encoding when interpreting the data?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.