Hmm... I thought Outlook was for something like email, so I wonder about the circumstance where it is used to "export" a csv file. If someone emailed you a csv file as an attachment, you would have to hope that the sender can enlighten you as to the character encoding they used. If you can't get that from them, you would have to use Encode::Guess with more possibilities besides cp1252 and "latin1". (Alas, guessing is relatively unreliable when it comes to picking the "right" encoding among the various single-byte-latin alternatives.)

Or you'll have to inspect the data file yourself to see if you can deduce what the encoding is. Any decent hex-dump tool would suffice (to see what the byte values are for the non-ascii characters), along with knowledge of the language being used in the text, and some reference info from http://www.unicode.org/Public/MAPPINGS/ (it's an ftp-able directory of mapping tables that relate all the various non-unicode character sets to unicode).

My inclination would be: download those unicode mapping tables into a single directory, look at a hex-dump of your csv file to see which non-ascii byte values to look up, figure out what letter each byte value represents, and grep over the mapping tables to find the line that relates that byte value to that letter.

The name of the mapping table containing that line represents the character encoding you need to use when opening the csv file.


In reply to Re^3: Reading CSV Files Containing UTF8 Characters by graff
in thread Reading CSV Files Containing UTF8 Characters by shoness

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.