deadpickle has asked for the wisdom of the Perl Monks concerning the following question:

Been a long time since I have been on here. I have to revisit a program for some research. I am trying to send a way point to a server where the point is wrote to a file. When I try to do this but calling: $sftp->write( $waytemp, $coords); I get the error:
Thread 2 terminated abnormally: write method can not handle UTf8 data + at GRRUVI- v1.43.pl line 1422 thread 2
Dont know what it means and could use some help.

Replies are listed 'Best First'.
Re: UTF8 error when using Net::SFTP::Foreign
by almut (Canon) on Feb 18, 2009 at 01:02 UTC
    ... write method can not handle UTf8 data ...

    Net::SFTP::Foreign's write method begins like this:

    sub write { @_ == 3 or croak 'Usage: $sftp->write($fh, $data)'; my ($sftp, $rfh) = @_; $sftp->flush($rfh, 'in') or return undef; utf8::is_utf8($_[2]) and croak "write method can not handle UTf8 d +ata"; ...

    so, I'd say your $coords is in UTF-8, which the module simply doesn't seem to support.  More precisely, the code croaks if the utf8 flag of $_[2] (i.e. $data, or $coords) is on.

    Try finding out if the flag is on for a good reason ('cos the data actually is (needs) UTF-8), or if it just got set by accidence — like when merging an ISO-Latin1 string with a UTF-8 one, which would cause the ISO-Latin1 part to be "upgraded". (see perluniintro)

      Since it's possible the flag doesn't matter, the snippet you posted should probably be changed to the following, the fatal equivalent of "Wide character in print".

      sub write { @_ == 3 or croak 'Usage: $sftp->write($fh, $data)'; my ($sftp, $rfh) = @_; $sftp->flush($rfh, 'in') or return undef; if (utf8::is_utf8($_[2])) { splice(@_, 2, 1, "$_[2]"); # Don't affect caller. utf8::downgrade($_[2], 1) # Change format to bytes. or carp("Can only write bytes to a socket"); } ...

      Try finding out if the flag is on for a good reason

      I don't see how that's relevant.

      If the data is text, it doesn't matter if the flag is on or not when you encode the data as needed.

      If the data isn't text, it doesn't matter if the flag is on or not. Just use utf8::downgrade.

      use strict; use warnings; use Encode qw( encode ); my $enc = 'iso-latin-1'; # Desired encoding my $text = chr(130); # Any char supported by iso-latin-1 and $enc utf8::downgrade(my $text_off = $text); utf8::upgrade (my $text_on = $text); # Encode text on output my $from_off = encode($enc, $text); my $from_on = encode($enc, $text); print("bytes are ", ($from_off eq $from_on ? 'same' : 'diff'), "\n");
      bytes are same
      use strict; use warnings; utf8::downgrade( my $bytes_off = '' ); $bytes_off .= chr($_) for 0..255; utf8::upgrade( my $bytes_on = '' ); $bytes_on .= chr($_) for 0..255; # Downgrade variable on output to avoid false positive. utf8::downgrade( my $from_off = $bytes_off ); utf8::downgrade( my $from_on = $bytes_on ); print("bytes are ", ($from_off eq $from_on ? 'same' : 'diff'), "\n");
      bytes are same

      Update: Fleshed out some details. Changed to use $_[2] as in original snippet.

        I have just uploaded to CPAN a new version of Net::SFTP::Foreign that uses utf8::downgrade. Thanks for the solution!
        Try finding out if the flag is on for a good reason
        I don't see how that's relevant.

        Well, it's relevant insofar as the module - as it is - would abort if the flag is on, so finding out the reason for it being on might be a first step to better understanding one's own code, and for taking appropriate measures.

        For example, in the following (contrived) situation

        my $s = "hello"; my $u = "\x{7777}"; print "utf8 flag ", (utf8::is_utf8($s) ? "on":"off"), "\n"; # off $s .= $u; # do something which upgrades $s $s = substr($s, 0, 5); # get back the orig. "hello" print "utf8 flag ", (utf8::is_utf8($s) ? "on":"off"), "\n"; # on - $s +ftp->write(...) would abort

        I'd say the flag is on "for no good reason", because the content is exactly the same as before manipulating $s (i.e. "hello"), and all characters occurring can be represented in plain ASCII.

        OTOH, if the data actually would contain unicode characters that cannot be represented in ASCII (or some legacy encoding like Latin-1, etc., for that matter), the flag would be on "for a good reason", in case the data needs to be treated in a character-based fashion.

        Whether the latter is the case with Net::SFTP::Foreign::write(), I simply don't know.  I didn't check what the author's specific reasons for not allowing UTF-8 might have been — as a first approximation, I tend to assume that module authors know what they're doing.

        I'm trying to get this working. Not sure how to implament this but I added this just before the write statement:
        utf8::downgrade( my $bytes_off = '' ); $bytes_off .= chr($coords) for 0..255; utf8::upgrade( my $bytes_on = '' ); $bytes_on .= chr($coords) for 0..255; # Downgrade variable on output to avoid false +positive. utf8::downgrade( my $from_off = $bytes_off ); utf8::downgrade( my $from_on = $bytes_on ); print("bytes are ", ($from_off eq $from_on ? ' +same' : 'diff'), "\n"); $sftp->write( $waytemp, $coords);
        where $coords = 12,23. Now I get the error
        Thread 2 terminated abnormally: Wide character in subroutine entry at + GRRUVI-v1. 43.pl line 1428.
        that I have been hearing about.
      I think ikegami's point is that the byte sequence won't change if all you do is turn off the utf8 flag, and it seems like that is the only issue that Net::SFTP::Foreign is having with the OP data. Consider:
      use strict; use warnings; main(); sub main { my $test = "\x{0414}"; # unicode cyrillic "capital letter de" printf( "character length: %d\n", length( $test )); check_string( $test, 1 ); # this call causes "Wide character in print" warning, but outpu +t is ok utf8::encode( $test ); printf( "byte length: %d\n", length( $test )); check_string( $test, 2 ); # no warning from this call } sub check_string { my ( $str, $num ) = @_; my $status = ( utf8::is_utf8( $str )) ? 'utf8' : 'not utf8'; printf( " %d -- check_string: input %s is %s\n", $num, $str, $stat +us ); }
      When I have that stored as "test.pl" and do perl test.pl, the output I get is:
      character length: 1
      Wide character in print at /tmp/test-bytes.pl line 21.
       1 -- check_string: input Д is utf8
      byte length: 2
       2 -- check_string: input Д is not utf8
      
      Of course, if I run that with perl -CS test.pl (to do the same thing as  binmode STDOUT, ":utf8";), the "Wide character in print" warning goes away, but then when check_string() gets called the second time, perl forces an "upgrade" of the two bytes that make up the "unflagged" cyrillic character, producing faulty output (four non-ascii bytes instead of two) - but that's a separate issue.

        I think ikegami's point is that the byte sequence won't change if all you do is turn off the utf8 flag,

        My point was simply that the solution doesn't depend on knowing whether the flag was on for a good reason or not.

        I have now elaborated on the solution.