in reply to Re: UTF8 error when using Net::SFTP::Foreign
in thread UTF8 error when using Net::SFTP::Foreign

Since it's possible the flag doesn't matter, the snippet you posted should probably be changed to the following, the fatal equivalent of "Wide character in print".

sub write { @_ == 3 or croak 'Usage: $sftp->write($fh, $data)'; my ($sftp, $rfh) = @_; $sftp->flush($rfh, 'in') or return undef; if (utf8::is_utf8($_[2])) { splice(@_, 2, 1, "$_[2]"); # Don't affect caller. utf8::downgrade($_[2], 1) # Change format to bytes. or carp("Can only write bytes to a socket"); } ...

Try finding out if the flag is on for a good reason

I don't see how that's relevant.

If the data is text, it doesn't matter if the flag is on or not when you encode the data as needed.

If the data isn't text, it doesn't matter if the flag is on or not. Just use utf8::downgrade.

use strict; use warnings; use Encode qw( encode ); my $enc = 'iso-latin-1'; # Desired encoding my $text = chr(130); # Any char supported by iso-latin-1 and $enc utf8::downgrade(my $text_off = $text); utf8::upgrade (my $text_on = $text); # Encode text on output my $from_off = encode($enc, $text); my $from_on = encode($enc, $text); print("bytes are ", ($from_off eq $from_on ? 'same' : 'diff'), "\n");
bytes are same
use strict; use warnings; utf8::downgrade( my $bytes_off = '' ); $bytes_off .= chr($_) for 0..255; utf8::upgrade( my $bytes_on = '' ); $bytes_on .= chr($_) for 0..255; # Downgrade variable on output to avoid false positive. utf8::downgrade( my $from_off = $bytes_off ); utf8::downgrade( my $from_on = $bytes_on ); print("bytes are ", ($from_off eq $from_on ? 'same' : 'diff'), "\n");
bytes are same

Update: Fleshed out some details. Changed to use $_[2] as in original snippet.

Replies are listed 'Best First'.
Re^3: UTF8 error when using Net::SFTP::Foreign
by salva (Canon) on Feb 18, 2009 at 13:37 UTC
    I have just uploaded to CPAN a new version of Net::SFTP::Foreign that uses utf8::downgrade. Thanks for the solution!

      Cool! The downside is that true errors (when the flag is on because the data could need to be encoded) are postponed, but not much can be done about that.

Re^3: UTF8 error when using Net::SFTP::Foreign
by almut (Canon) on Feb 18, 2009 at 03:28 UTC
    Try finding out if the flag is on for a good reason
    I don't see how that's relevant.

    Well, it's relevant insofar as the module - as it is - would abort if the flag is on, so finding out the reason for it being on might be a first step to better understanding one's own code, and for taking appropriate measures.

    For example, in the following (contrived) situation

    my $s = "hello"; my $u = "\x{7777}"; print "utf8 flag ", (utf8::is_utf8($s) ? "on":"off"), "\n"; # off $s .= $u; # do something which upgrades $s $s = substr($s, 0, 5); # get back the orig. "hello" print "utf8 flag ", (utf8::is_utf8($s) ? "on":"off"), "\n"; # on - $s +ftp->write(...) would abort

    I'd say the flag is on "for no good reason", because the content is exactly the same as before manipulating $s (i.e. "hello"), and all characters occurring can be represented in plain ASCII.

    OTOH, if the data actually would contain unicode characters that cannot be represented in ASCII (or some legacy encoding like Latin-1, etc., for that matter), the flag would be on "for a good reason", in case the data needs to be treated in a character-based fashion.

    Whether the latter is the case with Net::SFTP::Foreign::write(), I simply don't know.  I didn't check what the author's specific reasons for not allowing UTF-8 might have been — as a first approximation, I tend to assume that module authors know what they're doing.

      I'd say the flag is on "for no good reason", because the content is exactly the same as before manipulating $s (i.e. "hello"), and all characters occurring can be represented in plain ASCII.

      What's gained by knowing that?

      If the flag is on for no good reason, encode the text!
      If the flag is on for a good reason, encode the text!

        A third option (in addition to encoding or downgrading - as you've mentioned) could've been to simply fix the code that inadvertendly upgrades $coords...  as those coordinates appear to be simple numbers, I wouldn't know why unicode would need to play a role at all here.  That's why I got the impression that something unintentional might be going on in the OP's code...

Re^3: UTF8 error when using Net::SFTP::Foreign
by deadpickle (Pilgrim) on Feb 18, 2009 at 03:13 UTC
    I'm trying to get this working. Not sure how to implament this but I added this just before the write statement:
    utf8::downgrade( my $bytes_off = '' ); $bytes_off .= chr($coords) for 0..255; utf8::upgrade( my $bytes_on = '' ); $bytes_on .= chr($coords) for 0..255; # Downgrade variable on output to avoid false +positive. utf8::downgrade( my $from_off = $bytes_off ); utf8::downgrade( my $from_on = $bytes_on ); print("bytes are ", ($from_off eq $from_on ? ' +same' : 'diff'), "\n"); $sftp->write( $waytemp, $coords);
    where $coords = 12,23. Now I get the error
    Thread 2 terminated abnormally: Wide character in subroutine entry at + GRRUVI-v1. 43.pl line 1428.
    that I have been hearing about.

      You mean you don't get Argument "12,23" isn't numeric in chr? I have very low tolerance for people don't help themselves by using use strict; use warnings;.

      That code snippet you are using was used to demonstrate that utf8::downgrade works regardless of the internal format of a string. It's not a solution to your problem. If anything, the other snippet would be more relevant since you have text.

      If $coords had contained bytes (already encoded text or packed/binary data, etc), the solution would have been utf8::downgrade.

      Since $coords contains text, the solution is Encode's encode.

      use Encode qw( encode ); $sftp->write( $waytemp, encode($enc, $coords) );

      Replace $enc with the encoding you desire to use.