in reply to Reading UTF-16LE file into an array

As far as reading in utf-16le files goes, if you're using a 5.8 version of perl, use the :encoding(utf-16le) layer, with either the open pragmata, or the perlfunc:open or perlfunc:binmode functions.

If you aren't using 5.8, I suggest you upgrade.


Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

  • Comment on Re: Reading UTF-16LE file into an array

Replies are listed 'Best First'.
Re: Re: Reading UTF-16LE file into an array
by Anonymous Monk on Jun 11, 2003 at 18:56 UTC
    Thanks, ya I was messing with :encoding(utf-16le) like so:
    open (FH, '<:encoding(UTF-16LE)', $file)
    while ( <FH> )
    { }

    after running the program I am getting the following:
    Wide character in print at my_program.pl line 326.

    Line 326 of my code is basically printing the output to an HTML type file:
    print FILE "some html table code and info from $file";
    It apprears to be producing the right output. Do I need to worry about the "Wide character" issue?

      It means that you're outputting a character with code >255 to a filehandle that hasn't been marked as utf8. Perl will assume that you meant for it to be utf8 and do that, but warn you. This is dangerous, because if you're also outputting strings that don't have the utf8 bit set, and they contain chars 128-255, you'll get somthing that is invalid utf8, and just plain wrong in whatever legacy encoding you were using.

      (In other words, open FILE, '<:utf8', $whatever). Also, consider that naming your handles FILE and FH gives little information at best, and is confusing at worst.)


      Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

        Thanks for the input. My appologies for the confusing explainations of my work. Basically the utf16 file (FH) is being read in and (FILE) was a seperate file I was outputing. I added ">:encoding(UTF-16)" to the open statement on the output file and I am not getting the warning anymore...I will be building a bigger test set though to test it. Now I need to figure how to make this code work in a W2K/XP environment. I keep getting an Unknown open () mode ":encoding(UTF-16)" when running the code in Win env.