I have been using Perl's Encode module to do conversions of text from Unicode into various legacy encodings. (I realize that might seem backward but, nonetheless...) For the most part this has worked fine (for instance, creating Arabic cp1256 documents from a text composed in utf-8). I am having major problems, however, when I try to do conversion into the single byte Vietnamese encodings known as VISCII and CP1258.
The first problem is that characters which should convert smoothly do not. For instance, a message comes back that {\x1ead} "ậ" ( that is, "LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW" ) does not map to cp1258. It should, however, with the bytes: 0x00e2 0x00f2. It seems that the Encode::Byte module which claims to handle cp1258 conversion can't handle these complex Vietnamese characters (which are quite common).
The second problem is that, after having done a piece of the conversion, the process totally crashes with this message:
panic: sv_setpvn called with negative strlen at c:\Perl\lib\convert.pl line 52, <IN> line 838.
Line 52 is the line where I print through the filehandle OUT (through the layer appropriate to the encoding in question - cp1258 here):
use Encode; use Encode::Byte;
open (IN, "<:encoding($enc)", $infile) #assume $enc = utf8
open (OUT, ">:encoding($dest_enc)", $target) #assume $dest_enc = cp
+1258)
while (my $conv = <IN>) {
print OUT $conv;
}
I really have no idea what the "panic" message means. But beyond simply not encoding the characters, the effect of the error is to stop the process of reading lines in and printing them out.
Does anyone have expertise in the encoding module that can help me here? Alternately, does anyone know of another means of converting text into Vietnamese legacy encodings-- I have already worked with (an implemenation based on) iconv and found it unsatisfactory.