in reply to Re^2: Parsing a .xlsx file with chinese characters
in thread Parsing a .xlsx file with chinese characters

I gave this a test with the exact code you gave on my machine, and it worked great!

$ perl test.pl Sheet: Sheet1 ( 0 , 0 ) => this ( 1 , 1 ) => is ( 2 , 2 ) => a ( 3 , 3 ) => test ( 4 , 1 ) => 什麼 Sheet: Sheet2 Sheet: Sheet3 $

(PM may convert the text (traditional Chinese "shenme" -- what) into an entity here, but it definitely worked in my xterm)

It may be that whatever you're using to view the file isn't expecting UTF-8; or, perhaps the encoding in the XLSX itself isn't UTF-8 (but I'm not sure if that's an option in XLSX files or what!).

Replies are listed 'Best First'.
Re^4: Parsing a .xlsx file with chinese characters
by Sithiris (Novice) on Oct 03, 2011 at 21:32 UTC

    thanks for trying it. I'm guessing from a quick google search of xterm you are running the script in a non-Windows environment? Is it possible this would have an effect on it's success? I'm guessing doubtfully considering excel is a windows based programme.

      You're right; I ran it on a Linux VM.

      If you're running this in the Windows terminal (cmd.exe or what have you), I'm inclined to think the problem isn't with the output from Excel::Spreadsheet, but that cmd doesn't display UTF-8 properly.

      What if you redirect the output of the script to a .html file, then try loading it in a browser? Make sure the encoding gets detected as UTF-8. If it displays correctly, it's just the terminal, and your data is fine. :)

        I have said in my script to print to a UTF8 encoded text file which I opened in word and it displayed correctly just wrong characters.

        what I am thinking is that it may be 'deconstructing the character for example instead of "\x{2013}" it is displaying "\xE2","\x80","\x93". If this is the case would there be a way to force it?