Wide character problem using Socket.pm

chacon has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am having a wide character problem sending a package when I use socket.pm.
When I use a short package there is no problem, but when I try to sent a larga package the program generated a wide character problem.

sub send 
{

@_ >= 2 && @_ <= 4 or croak 'usage: $sock->send(BUF,[FLAGS, [TO]])';

my $sock  = $_[0];
my $flags = $_[2] || 0;
my $peer  = $_[3] || $sock->peername;

croak 'send: Cannot determine peer address' unless($peer);

################################################
################################################
THE ERROR IS GENERATE WHEN EXECUTE THE NEXT LINE
################################################
################################################

my $r = defined(getpeername($sock))
      ? send($sock, $_[1], $flags)
      : send($sock, $_[1], $flags, $peer);

# remember who we send to, if it was successful
${*$sock}{'io_socket_peername'} = $peer    if(@_ == 4 && defined $r);

$r;
}
[download]

Edited by Chady -- Added code tags.

Comment on Wide character problem using Socket.pm Download Code

Replies are listed 'Best First'.
Re: Wide character problem using Socket.pm by ni-s (Initiate) on Sep 09, 2004 at 19:25 UTC
Well the problem isn't in Send.pm either. The problem is the string that becomes send's `$_[1]`. You created that string in a way that has put Unicode characters larger than chr(0xff) in it. But your socket is not marked as expecting UTF-8. So show us how you build the "packet".	[reply] [d/l]
Re: Wide character problem using Socket.pm by bart (Canon) on Sep 09, 2004 at 19:43 UTC
That sounds like a problem where your sent string is implicitely converted from UTF-8 into Latin-1 when sent over the socket (that can only happen in perl 5.8.x and later), and the conversion fails because some character's code is 256 or above. And that is when you get that warning. On standard filehandles, the solution is typically to use binmode with the `":utf8"` as a second parameter, making it skip the conversion away from UTF8, and send the UTF-8 characters, sometimes two or three bytes per character. Perhaps that may work on sockets too, it's worth a try. The other option is making the string not UTF-8 yourself. You can use the Encode::Encoder module, converting into any single byte character set you like. There are some tricks one can pull using pack, so you can keep the UTF8 bytes, and still make perl thinks it is raw bytes. `$raw = pack "C0a*", $utf8;` [download] You can achieve the same effect, using the `_utf8_off()` function from Encode, telling perl that "this is not an UTF8 string".	[reply] [d/l]