Hi Monks, I am building an application that loads data from an excel file into an Orcale database. The idea is to save the file as tab delimited (text), and simply to parse it using Perl. All the data I work with is in UTF-8. This works fine, unless the data in the excel file uses special characters (in my case Korean chars). For Example:

Korean Studies Information Service System/한국학술정보학술지원문데이터베이스

You can open an excel file and paste this example if you want to recreate the problem (not placed in code tags because the chars are encoded). If I save this as a text file, the special characters all turn into '?'. Another option is to save the file as unicode. The problem now is that the text file is encoded using utf-16 and not utf-8, and I can't load it into the DB.
I tried to convert to utf-8 using Encode, but with no success for Korean characters (although with partial success for Czech chars, so I think I might be in the right direction).
This is the code I used (test is the unicoded file, test_utf is the utf-8 encoded file):
#!usr/bin/per use strict; use warnings; use Encode qw(encode decode); open IN, "<", "test" or die; open OUT, ">", "test_utf" or die; while (my $line= <IN>){ ##unicode = utf-16 I think $line = decode('unicode', $line); $line = encode('utf-8', $line); print OUT $line; } close IN; close OUT
Any idea what might work?
Also, this is isn't strictly Perl, but if anybody has an idea how to save an excel file as utf-8 without losing special chars I will be extremely grateful
Thanks,
Guy

Man is the only animal that can remain on friendly terms with the victims he intends to eat until he eats them.
- Samuel Butler

In reply to Getting Data from an Excel File by mrguy123

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.