kangaroobin has asked for the wisdom of the Perl Monks concerning the following question:

Yes, I've searched for that error message, but the solutions online haven't worked for me.

I have a text file that is a screen scrape by Automator on OS X. When I perform file: data.txt on Terminal, I get this: Big-endian UTF-16 Unicode English character data, with CR line terminators.

So I open it as UTF-16, but whenever I try to print something, I get the error in the title. I've boiled down my program to something so simple, it's trivial:

open (DATA, "<:encoding(utf16be)", "$filename") or die "Couldn't open +DATA: $!"; while (<DATA>) { print; } close DATA;

Why am I getting this error? The file doesn't appear to have any weird character when I open it with Text Editor. Could it be the carriage returns or something?

Update: I'm tried binmode STDOUT, ':utf8' and all I get is weird looking output. For example, here's my program and the data:

#!/usr/local/bin/perl use strict; use warnings; use diagnostics; open (DATA, "<:encoding(utf16be)", "data.txt") or die "Couldn't open +DATA: $!"; while (<DATA>) { binmode STDOUT, ':utf8'; print; } close DATA; __DATA__ Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Phasellus eu est quis nisi congue ornare. Suspendisse quis felis. Suspendisse placerat orci ut turpis. Suspendisse metus dolor, aliquet et, lobortis facilisis, fermentum ac, + massa. Cras nonummy lobortis urna. Aenean facilisis vulputate felis. Mauris pharetra malesuada quam.

The data is in a UTF-16 big-endian file with CR line terminators. Here's what I get when I run it:

Mauris pharetra malesuada quam.usernames-ibook-g4:~/desktop username$ ssa.

What the heck is going on? The text prints over itself; it's weird.

Replies are listed 'Best First'.
Re: Wide character in print
by Anonymous Monk on Mar 07, 2008 at 04:50 UTC
    use diagnostics;
    Wide character in %s (W utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see the open manpage and binmode in the perlfunc manpage.
Re: Wide character in print
by Anonymous Monk on Mar 07, 2008 at 07:38 UTC
    Try :encoding(utf16be)
Re: Wide character in print
by roboticus (Chancellor) on Mar 07, 2008 at 11:06 UTC
    kangaroobin:

    If you're setting STDOUT to binmode, and your input file uses CR line terminators, then the output text is supposed to print over itself. A CR doesn't give you a line feed, it just returns the carriage to the left margin. You'll have to add a LF if you want to advance the carriage.

    Of course, if your output is consumed by a program expecting CR delimited lines, then just ignore the fact that your lines overprint....

    ...roboticus