in reply to keeping diacritical marks in a string
It matters what sort of character encoding the web site is using (some sort of latin-1? utf-8? something else?), and it also matters what your script is doing when opening file handles for input or output, making database connections, and using LWP methods. Oh, and it also matters what character encoding is being used in the database. (Is it the same or different compared to what is being used at the web site?)
Lacking all those details, I don't think there's much we can say about your problem -- except that it sounds a bit implausible: if the web site content includes accented characters, I wouldn't expect a quiet conversion to "basic ASCII", unless your script is explicitly applying this sort of behavior somehow. I might expect warnings or errors or some sort of character-entity-reference stuff, if the data is ending up different from its original form.
|
|---|