Hi All,
For a while now it's been my job to deal with user uploads. The users are globally disperse, often have very limited computer skills, and upload text files that were created in various locales.
It's very challenging to "Do The Right Thing". Encode::Detect helps a lot, and works great a lot of the time. I convert everything to UTF-8, so from then on there aren't any issues... Well, until the user exports and doesn't get how to open the file in UTF-8 mode (depending on what program they are using). But I'm not worried much about exports right not.
I'm well aware that it doesn't matter where the user is from, as potentially the file they are uploading could be from any locale. But I've found that for our users at least, it's pretty consistent where they are from to what locale their uploads tend to be in. For example, Norwegian users using Mac tend to upload files in MacIcelandic locale, Russian Windows users Windows-1251, etc.
So what I'm going to do is use HTTP_USER_AGENT, GeoIP and HTTP_ACCEPT_LANGUAGE to give me a best guess at locale for when Encode::Detect gets it wrong. This'll likely be displayed to the user with translation examples so that they can chose the charset that works.
For the life of me I cannot find on google any examples of people doing this, or any modules for this kind of mapping on CPAN. Am I missing something? Otherwise I may as well create a new CPAN module for this, so that others in my situation may benefit.
Lyle
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.