It certainly sounds like a problem in which unicode is involved... but it might be better to call it a "mismatched encodings" problem. Whatever page is being pulled in by LWP, it is apparently using some sort of "smart" or "wide-character" variants for the quotes and dashes, and in order to do that, the page should be labeled as to the particular (non-ASCII) character encoding that it is using in order to represent these special characters.
Meanwhile, your own "current" web page is probably specifying a different character encoding, and/or you are viewing the page with a browser that is forcing its display to use some particular encoding, and the result is a conflict (a mismatch) with the original data received via LWP, so you are seeing what happens when the characters are misinterpreted.
It's also possible that your script may be doing certain "standard" operations on the data, via CPAN modules or your own code, and in the process, perl is doing some sort of "default, assumed-to-be-reasonable" conversion of the character encoding, again with the result that the special characters are being misinterpreted as something that they were not meant to be.
If you can show the original url, or some of the relevant unmodified strings from that page, and/or some minimal snippet of your own code that produces this behavior, it would be more likely that we could pinpoint the issue(s) for you.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.