perl5ever has asked for the wisdom of the Perl Monks concerning the following question:

The function utf8::is_utf8() is not available for perl 5.8.0. Is there an easy way to determine the setting of the UTF8 flag for a scalar?

Replies are listed 'Best First'.
Re: good way to implement utf8::is_utf8 for perl 5.8.0
by ikegami (Patriarch) on Mar 19, 2009 at 16:54 UTC
    sub is_utf8 { utf8::upgrade( $_[0] ); return 1; }

    But seriously,

    sub is_utf8 { my $s = "\x80" . $_[0]; my $internal = unpack "p", pack "p", $s; return $s ne $internal; }

    Tested in 5.8.0, 5.8.8 and 5.10.0 using:

    utf8::downgrade( my $empty_dn = '' ); # 0 utf8::upgrade( my $empty_up = '' ); # 1 utf8::downgrade( my $ascii_dn = 'a' ); # 0 utf8::upgrade( my $ascii_up = 'a' ); # 1 utf8::downgrade( my $hibit_dn = chr(0xC9) ); # 0 utf8::upgrade( my $hibit_up = chr(0xC9) ); # 1 utf8::upgrade( my $wide_up = chr(0x2660) ); # 1 for ( $empty_dn, $empty_up, $ascii_dn, $ascii_up, $hibit_dn, $hibit_up, $wide_up, ) { print is_utf8($_)?1:0, "\n"; }

    Why do you need to know?

    Update: Added test code.

      Thanks!
      Why do you need to know?
      I am maintaining an app that uses 5.8.0 and upgrading perl would require too much testing (it's a very big app.)

        I meant: What use do you have for is_utf8? I find it's usually used where the following should be used:

        utf8::downgrade($s, 1) or croak("Wide characters in foo argument");