Just for grins, download the script I posted here a while ago:
tlu -- TransLiterate Unicode. Run your "funky" data through that script and see what comes out.
If you see stuff that looks like \x{02BC} or \x{2019} then what you have is utf8 text data with some "wide" characters in it, and your initial problem, as explained by ikegami, is that you aren't looking at it the right way or using the right tools to view it. The "tlu" script converts wide characters into their "literal" hex-numeric code-point form, using perl syntax by default.
Some of your wide characters will have ascii and (single-byte) Latin-1 equivalents (e.g. the apostrophe or right-single-quote mark or the copyright symbol), but some might not. By reading the data as utf8 (the way it's supposed to be read), there are lots of ways in perl to easily fix or remove them as you see fit.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.