Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Seeking Perl docs about how UTF8 flag propagates

by ikegami (Patriarch)
on May 17, 2023 at 19:18 UTC ( [id://11152267]=note: print w/replies, xml ) Need Help??


in reply to Re: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

utf8::is_utf8

Indicates with internal storage format is used by a scalar.

USE: Debugging XS modules.

utf8::upgrade

Changes a scalar to use the upgraded string format (if it's not already) without changing the string.

my $s = ...; my $t = $s; utf8::upgrade( $t ); say utf8::is_utf8( $t ) ?1:0; # 1 say $s eq $t ?1:0; # 1

USE: Working around instances of The Unicode Bug.

utf8::downgrade

Changes a scalar to use the downgraded string format (if it's not already) without changing the string. Dies if it can't.

my $s = ...; my $t = $s; utf8::downgrade( $t ); # Might croak say utf8::is_utf8( $t ) ?1:0; # 0 say $s eq $t ?1:0; # 1

USE: Working around instances of The Unicode Bug.

utf8::encode

Encodes a string using utf8.

Expects a string of arbitrary characters in either storage format.

Produces a string of 8-bit characters in the downgraded format.

USE: You should probably be encoding using the standard UTF-8 encoding instead of the Perl-specific utf8 encoding.

utf8::decode

Decodes a string encoded using utf8. Dies if it can't.

Expects a string of 8-bit characters in either storage format.

Produces a string of characters in the upgraded format.

USE: utf8 is a Perl-specific encoding. Are sure the text isn't encode using the standard UTF-8 encoding?

Encode::is_utf8

Indicates with internal storage format is used by a scalar.

USE: You might as well use the equivalent built-in utf8::is_utf8.

Encode::_utf8_on

Mostly equivalent to the following:

utf8::decode( $_ ) if !utf8::is_utf8( $_ );

The difference is that it produces a corrupt scalar if the string isn't valid utf8.

USE: Do not use as it introduces The Unicode Bug.

Encode::_utf8_off

Equivalent to the following:

utf8::encode( $_ ) if utf8::is_utf8( $_ );

USE: Do not use as it introduces The Unicode Bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11152267]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (1)
As of 2024-04-25 04:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found