in reply to Unicode Woes

You decide on one single encoding to store all your data in the database. If most of your data is English, and especially if you're in a unix environment, UTF-8 is the natural choice.

Then you have to make sure that everything you send out is marked as UTF-8: The "Content-type" HTTP header should be set to "text/html; charset=utf-8".

And, of course, you have to make sure everything you put in your database is in UTF-8, too. You can use Perl's Encode module to do this. If you know what encoding the input is in, it is easy. If you don't, it's less easy :)

Replies are listed 'Best First'.
Re^2: Unicode Woes
by BigLug (Chaplain) on Oct 01, 2004 at 09:10 UTC
    I've tried that ... I get the data from LWP, then send it through DBI to Postgres. However it ends up as a string of � characters. More importantly, when I send the recieved string out to the browser (with the header as you say) I similarly get nothing appearing.

    Cheers!
    Rick
    If this is a root node: Before responding, please ensure your clue bit is set.
    If this is a reply: This is a discussion group, not a helpdesk ... If the discussion happens to answer a question you've asked, that's incidental.
      There are many links in this chain, and if things don't work as a whole you have to go over them link by link to see where the problem(s) happen.

      View Unicode in hex offers a nice way of seeing what your actual data is. Adapt the code there to print what you get from LWP. Then make sure what gets fetched from the database is still UTF-8. Finally don't trust your browser, download the page that the web server handed you and see what information is actually there. (It might be your server is sending *two* Content-type headers, in which case only one of them (the wrong one, by Murphy's law) is honored by your browser.

      Oh, you also need to tell DBD::Pg that your data needs to be treated as UTF-8. Check out the pg_enable_utf8 attribute. (If you move to mysql one day, contact me for a patch giving similar functionality.)

      I suggested many things above but I recommend you tackle them one at a time, not all at once. That way if the first link in the chain was the only bad one you don't waste your time with the others.