in reply to keeping diacritical marks in a string

Hi Foxpound,
The characters are almost certainly encoded as html entities on the web site ie é will be represented in the page as éor É.

in order to decode these you can use

HTML::Entities::decode_entities($string)
This will change entities in $string to Unicode characters, which is the most likely encoding in the database. Check out the module's documentation at HTML::Entities

That said, more info would be useful as this solution only deals with the default case.