ppremkumar has asked for the wisdom of the Perl Monks concerning the following question:
Hi, Team
I am at loss in terms of understanding the encoding process required in Perl with regard to writing results to the command prompt or to a file.
In the below code, the first portion outputs just fine. In portion 2, however, when I added an em dash or the set of characters "ĀǎỠĨǒAder," the output is junk. (Yes, I want to print out ĀǎỠĨǒAder as is.)
use warnings; use strict; use Encode qw(encode decode); # portion 1 my $str = 'Çirçös'; $str = decode('utf-8', $str); print "$str\n"; # portion 2 my $str1 = 'Çirçös—'; # HTML entity (decimal) for em dash: — $str1 = decode('utf-8', $str1); print "$str1\n"; # output from my Eclipse editor # Çirçös # Çirçös— # Wide character in print at D:/EPIC_workspace/PERL/Bibliography/test. +pl line 10.
Please help me understand what I am doing wrong.
What I am really trying to do is read a Microsoft Word file that has special characters and store that data into a text file.
Thanks,
Prem
UPDATE: I have found a solution to my problem: http://www.lemoda.net/perl/win32-ole-utf8/cp-utf8-ole.html
I had to set the Win32 component to CP_UTF8 and set the code page of Win32::OLE to CP_UTF8.
Now, even if my Microsoft Word files have special characters such as "Aderñŋšžľŀīửừứ," I could read each line of the Word file and save it in a text file without loss of characters. I thank each of you for your help and time. Greatly appreciated
# Get the constant. use Win32::OLE 'CP_UTF8'; # Set the code page of Win32::OLE. $Win32::OLE::CP = CP_UTF8;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl Encoding/Decoding Doubt: From a Novice
by hippo (Archbishop) on Jul 04, 2013 at 11:13 UTC | |
by ppremkumar (Novice) on Jul 04, 2013 at 12:58 UTC | |
|
Re: Perl Encoding/Decoding Doubt: From a Novice
by Loops (Curate) on Jul 04, 2013 at 11:43 UTC | |
by ppremkumar (Novice) on Jul 05, 2013 at 05:19 UTC | |
|
Re: Perl Encoding/Decoding Doubt: From a Novice
by Khen1950fx (Canon) on Jul 04, 2013 at 14:01 UTC | |
by ppremkumar (Novice) on Jul 04, 2013 at 15:34 UTC | |
by Khen1950fx (Canon) on Jul 04, 2013 at 16:11 UTC | |
by Anonymous Monk on Jul 05, 2013 at 04:23 UTC |