This suggestion is tangential to your unicode question, but has other advantages. Why not skip the XML and just use DBD::ODBC or DBD::Excel to read the Excel file directly? If you need to end up with XML in addition to the database and HTML, then you might be able to generate the XML from the database (or directly from the spreadsheet) using DBD::AnyData.