in reply to My UTF-8 text isn't surviving I/O as expected

You have correctly prepared the code to handle UTF-8 in the source code and input and output operations. What's missing is doing the same for the communication with the database.

By default, DBD::SQLite uses a setting which is wrong (see the documentation for details). To fix it, only slight changes are needed:

use DBD::SQLite::Constants ':dbd_sqlite_string_mode'; my $dbh = DBI->connect( "dbi:SQLite:dbname=:memory:", "", "", {RaiseError => 1, AutoCommit => 1, sqlite_string_mode => DBD_SQLITE_STRING_MODE_UNICODE_STRICT });

PerlMonks is very old and its <code> sections can't handle Unicode. Either use <pre> instead, or replace unicode characters in the source code by their names:

my $utf8_text1 = "\N{LATIN CAPITAL LETTER A WITH RING ABOVE}ke Lindstr +\N{LATIN SMALL LETTER O WITH DIAERESIS}m";

BTW, get into the habit of using placeholders to insert values to prevent SQL injection:

my $insert = $dbh->prepare('INSERT INTO names VALUES(?, ?)'); $insert->execute('nm0512537', $utf8_text1);

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: My UTF-8 text isn't surviving I/O as expected
by ibm1620 (Hermit) on Nov 23, 2024 at 22:35 UTC
    THANK YOU!! That cleared up a lot. I didn't even suspect that the culprit was sqlite.

    I still have trouble reading UTF-8 from command line arguments. I assume this is not a Perl issue; any suggestions how to fix?

    #!/usr/bin/env perl
    use v5.40;
    use utf8;
    use open qw(:std :encoding(UTF-8));
    
    my $utf8_text1 = shift;
    say "A variable set from argument on command line";
    show ($utf8_text1);
    
    my $utf8_text2 = 'Åke Lindström;
    say "A variable set to UTF8 literal";
    show($utf8_text2);
    
    chomp (my $utf8_text3 = <>);
    say "A variable set by reading from STDIN";
    show($utf8_text3);
    
    sub show($str) {
        say "Binary:      ", join ' ', (unpack "H*", $str) =~ m/../g ;
        say "Text>STDOUT: $str";
    }
    
    Output:
    $ echo "Åke Lindström" | u 'Åke Lindström'
    A variable set from argument on command line
    Binary:      c3 85 6b 65 20 4c 69 6e 64 73 74 72 c3 b6 6d
    Text>STDOUT: Åke Lindström
    A variable set to UTF8 literal
    Binary:      c5 6b 65 20 4c 69 6e 64 73 74 72 f6 6d
    Text>STDOUT: Åke Lindström
    A variable set by reading from STDIN
    Binary:      c5 6b 65 20 4c 69 6e 64 73 74 72 f6 6d
    Text>STDOUT: Åke Lindström