You might have become a victim of smart quotes, which is a feature of some programs (in particular word processors) to automatically replace regular ASCII single/double quotes with their curly counterparts, that typographers and designers are so fond of, because the quotes' opening (left-side) and closing (right-side) representations possess the slightly different look, as used in professional typesetting. This means that if you type: 'word', you'll get ‘word’ (or even some other form, depending on the locale), or "word" —> “word” (zoom in if you can't see a difference...) They are entirely different (non-ASCII) characters.

How to get rid of them depends on how they're encoded. In Unicode, they are the codepoints U+2018 - U+201B, while for example in CP1252, they are 0x91, 0x92, 0x82 (curly single quotes) and 0x93, 0x94, 0x84 (curly double quotes).

You can replace them using Perl's tr/// or s///, e.g.

# unicode tr/\x{2018}-\x{201B}/'/; tr/\x{201C}-\x{201F}/"/; # CP1252 tr/\x91\x92\x82/'/; tr/\x93\x94\x84/"/; # or, if you have UTF-8 data which isn't properly flagged as such, # you can try to directly replace the multi-byte sequence, as # they're encoded in UTF-8, e.g. s/\xe2\x80\x98/'/g; # one of the single quotes, # ... # and similarly for the 7 others...

(This is just a guess, I can't tell for sure whether that really is your problem... you haven't told in detail in what way the quotes appear wrong...)


In reply to Re: character differences and SOAP encoding by almut
in thread character differences and SOAP encoding by foftoogs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.