Some relevant sample data (or the web site url, if that's appropriate) would really help here, along with an actual code snippet that shows us what you are doing with the data.

It matters what sort of character encoding the web site is using (some sort of latin-1? utf-8? something else?), and it also matters what your script is doing when opening file handles for input or output, making database connections, and using LWP methods. Oh, and it also matters what character encoding is being used in the database. (Is it the same or different compared to what is being used at the web site?)

Lacking all those details, I don't think there's much we can say about your problem -- except that it sounds a bit implausible: if the web site content includes accented characters, I wouldn't expect a quiet conversion to "basic ASCII", unless your script is explicitly applying this sort of behavior somehow. I might expect warnings or errors or some sort of character-entity-reference stuff, if the data is ending up different from its original form.


In reply to Re: keeping diacritical marks in a string by graff
in thread keeping diacritical marks in a string by Foxpond Hollow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.