Hank has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that makes use of Encode from_to to convert legacy encoding to utf-8, followed by some regexp m/ s/ and substr operations. It works just fine in 5.8.0 and 5.8.6 but strangely not 5.8.1, which issues a "Wide character in subroutine entry at /path/5.8.1/i686-linux/Encode.pm line 184." Since I don't maintain the perl install, upgrading isn't an option. Besides I hate not at all knowing why. Thanks. (I'm posting this as a last resort....)

Replies are listed 'Best First'.
Re: Encode module, wide character
by PodMaster (Abbot) on Feb 06, 2005 at 14:27 UTC
    `perldoc perldiag'
    Wide character in %s

    (W utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see the open manpage and binmode in the perlfunc manpage.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Encode module, wide character
by graff (Chancellor) on Feb 06, 2005 at 22:07 UTC
    I'd be grateful if you could include a minimal example of code that shows this behavior. I'm in an environment where different OS's on the local network are using 5.8.0, 5.8.1 and 5.8.5; if we haven't seen this "wide character in subroutine entry" warning yet, we probably will soon... There's a chance that it might not have anything to do with enabling wide characters on file handles (since the message doesn't mention file handles or i/o).

    But after all, it is just a warning, and Perl is most likely doing "The Right Thing", despite complaining about it. This sort of warning about wide characters is Perl's way of saying "if you weren't already aware of this condition, you probably should be, just in case it might alter some assumptions you may have been making about your data..."

      Interestingly the script works fine with ActiveStates's 5.8.1 build (no warnings whatsoever), so AFAIK the problem is limited to the Linux build on the production server. Note that the latter does not merely throw the typical "wide character" warnings -- the script terminates with error (see below) even if the "no warnings" pramata is issued.

      Since I am hardpressed to come up with a working minimal code, below is some pseudocode.

      use LWP::UserAgent; use HTML::Parser; use Encode; # Grab a HTML document from the Net. # Extract links from HTML document (legacy encoding, in this case Big5 +). # Decode linked strings # In my original post I used from_to($linktext, big5, utf-8)'s # in-place conversion. Same error, though. $linktext = decode( 'big5-eten', $linktext, Encode::FB_HTMLCREF ); # Test linked texts against a list of strings #Loop of linked texts foreach ( @keywords ) { local $_ = decode('utf-8', $_); # Force utf-8 flag if ( $linktext =~ /$_/ ) { # Do something } } # Typically the initial iterations appear OK but the loop never comple +tes # before the Wide Character error shows up. # End of loop

      I'm at a loss as to why the two 5.8.1 builds would behave differently even as 5.8.0 and 5.8.6 work. I also have no idea whether a workaround is possible.