UTF8 error when using Net::SFTP::Foreign

deadpickle has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: UTF8 error when using Net::SFTP::Foreign by almut (Canon) on Feb 18, 2009 at 01:02 UTC
`... write method can not handle UTf8 data ...` Net::SFTP::Foreign's `write` method begins like this: `sub write { @_ == 3 or croak 'Usage: $sftp->write($fh, $data)'; my ($sftp, $rfh) = @_; $sftp->flush($rfh, 'in') or return undef; utf8::is_utf8($_[2]) and croak "write method can not handle UTf8 d +ata"; ...` [download] so, I'd say your `$coords` is in UTF-8, which the module simply doesn't seem to support. More precisely, the code croaks if the utf8 flag of `$_[2]` (i.e. `$data`, or `$coords`) is on. Try finding out if the flag is on for a good reason ('cos the data actually is (needs) UTF-8), or if it just got set by accidence — like when merging an ISO-Latin1 string with a UTF-8 one, which would cause the ISO-Latin1 part to be "upgraded". (see perluniintro)	[reply] [d/l] [select]
Re^2: UTF8 error when using Net::SFTP::Foreign by ikegami (Patriarch) on Feb 18, 2009 at 01:42 UTC
Since it's possible the flag doesn't matter, the snippet you posted should probably be changed to the following, the fatal equivalent of "`Wide character in print`". `sub write { @_ == 3 or croak 'Usage: $sftp->write($fh, $data)'; my ($sftp, $rfh) = @_; $sftp->flush($rfh, 'in') or return undef; if (utf8::is_utf8($_[2])) { splice(@_, 2, 1, "$_[2]"); # Don't affect caller. utf8::downgrade($_[2], 1) # Change format to bytes. or carp("Can only write bytes to a socket"); } ...` [download] Try finding out if the flag is on for a good reason I don't see how that's relevant. If the data is text, it doesn't matter if the flag is on or not when you encode the data as needed. If the data isn't text, it doesn't matter if the flag is on or not. Just use utf8::downgrade. `use strict; use warnings; use Encode qw( encode ); my $enc = 'iso-latin-1'; # Desired encoding my $text = chr(130); # Any char supported by iso-latin-1 and $enc utf8::downgrade(my $text_off = $text); utf8::upgrade (my $text_on = $text); # Encode text on output my $from_off = encode($enc, $text); my $from_on = encode($enc, $text); print("bytes are ", ($from_off eq $from_on ? 'same' : 'diff'), "\n");` [download] `bytes are same` [download] `use strict; use warnings; utf8::downgrade( my $bytes_off = '' ); $bytes_off .= chr($_) for 0..255; utf8::upgrade( my $bytes_on = '' ); $bytes_on .= chr($_) for 0..255; # Downgrade variable on output to avoid false positive. utf8::downgrade( my $from_off = $bytes_off ); utf8::downgrade( my $from_on = $bytes_on ); print("bytes are ", ($from_off eq $from_on ? 'same' : 'diff'), "\n");` [download] `bytes are same` [download] Update: Fleshed out some details. Changed to use `$_[2]` as in original snippet.	[reply] [d/l] [select]
Re^3: UTF8 error when using Net::SFTP::Foreign by salva (Canon) on Feb 18, 2009 at 13:37 UTC
I have just uploaded to CPAN a new version of Net::SFTP::Foreign that uses `utf8::downgrade`. Thanks for the solution!	[reply] [d/l]
Re^4: UTF8 error when using Net::SFTP::Foreign by ikegami (Patriarch) on Feb 18, 2009 at 14:21 UTC
Re^3: UTF8 error when using Net::SFTP::Foreign by almut (Canon) on Feb 18, 2009 at 03:28 UTC
Try finding out if the flag is on for a good reason I don't see how that's relevant. Well, it's relevant insofar as the module - as it is - would abort if the flag is on, so finding out the reason for it being on might be a first step to better understanding one's own code, and for taking appropriate measures. For example, in the following (contrived) situation `my $s = "hello"; my $u = "\x{7777}"; print "utf8 flag ", (utf8::is_utf8($s) ? "on":"off"), "\n"; # off $s .= $u; # do something which upgrades $s $s = substr($s, 0, 5); # get back the orig. "hello" print "utf8 flag ", (utf8::is_utf8($s) ? "on":"off"), "\n"; # on - $s +ftp->write(...) would abort` [download] I'd say the flag is on "for no good reason", because the content is exactly the same as before manipulating $s (i.e. "hello"), and all characters occurring can be represented in plain ASCII. OTOH, if the data actually would contain unicode characters that cannot be represented in ASCII (or some legacy encoding like Latin-1, etc., for that matter), the flag would be on "for a good reason", in case the data needs to be treated in a character-based fashion. Whether the latter is the case with `Net::SFTP::Foreign::write()`, I simply don't know. I didn't check what the author's specific reasons for not allowing UTF-8 might have been — as a first approximation, I tend to assume that module authors know what they're doing.	[reply] [d/l] [select]
Re^4: UTF8 error when using Net::SFTP::Foreign by ikegami (Patriarch) on Feb 18, 2009 at 03:59 UTC
Re^5: UTF8 error when using Net::SFTP::Foreign by almut (Canon) on Feb 18, 2009 at 05:04 UTC
Re^3: UTF8 error when using Net::SFTP::Foreign by deadpickle (Pilgrim) on Feb 18, 2009 at 03:13 UTC
I'm trying to get this working. Not sure how to implament this but I added this just before the write statement: `utf8::downgrade( my $bytes_off = '' ); $bytes_off .= chr($coords) for 0..255; utf8::upgrade( my $bytes_on = '' ); $bytes_on .= chr($coords) for 0..255; # Downgrade variable on output to avoid false +positive. utf8::downgrade( my $from_off = $bytes_off ); utf8::downgrade( my $from_on = $bytes_on ); print("bytes are ", ($from_off eq $from_on ? ' +same' : 'diff'), "\n"); $sftp->write( $waytemp, $coords);` [download] where `$coords = 12,23`. Now I get the error `Thread 2 terminated abnormally: Wide character in subroutine entry at + GRRUVI-v1. 43.pl line 1428.` [download] that I have been hearing about.	[reply] [d/l] [select]
Re^4: UTF8 error when using Net::SFTP::Foreign by ikegami (Patriarch) on Feb 18, 2009 at 04:06 UTC
Re^2: UTF8 error when using Net::SFTP::Foreign by graff (Chancellor) on Feb 18, 2009 at 02:29 UTC
I think ikegami's point is that the byte sequence won't change if all you do is turn off the utf8 flag, and it seems like that is the only issue that Net::SFTP::Foreign is having with the OP data. Consider: use strict; use warnings; main(); sub main { my $test = "\x{0414}"; # unicode cyrillic "capital letter de" printf( "character length: %d\n", length( $test )); check_string( $test, 1 ); # this call causes "Wide character in print" warning, but outpu +t is ok utf8::encode( $test ); printf( "byte length: %d\n", length( $test )); check_string( $test, 2 ); # no warning from this call } sub check_string { my ( $str, $num ) = @_; my $status = ( utf8::is_utf8( $str )) ? 'utf8' : 'not utf8'; printf( " %d -- check_string: input %s is %s\n", $num, $str, $stat +us ); } [download] When I have that stored as "test.pl" and do `perl test.pl`, the output I get is: character length: 1 Wide character in print at /tmp/test-bytes.pl line 21. 1 -- check_string: input Д is utf8 byte length: 2 2 -- check_string: input Д is not utf8 Of course, if I run that with `perl -CS test.pl` (to do the same thing as `binmode STDOUT, ":utf8";`), the "Wide character in print" warning goes away, but then when check_string() gets called the second time, perl forces an "upgrade" of the two bytes that make up the "unflagged" cyrillic character, producing faulty output (four non-ascii bytes instead of two) - but that's a separate issue.	[reply] [d/l] [select]
Re^3: UTF8 error when using Net::SFTP::Foreign by ikegami (Patriarch) on Feb 18, 2009 at 04:17 UTC
I think ikegami's point is that the byte sequence won't change if all you do is turn off the utf8 flag, My point was simply that the solution doesn't depend on knowing whether the flag was on for a good reason or not. I have now elaborated on the solution.	[reply]