ibm1620 has asked for the wisdom of the Perl Monks concerning the following question:
The advice given by brian d foy (https://stackoverflow.com/a/47946606/522385) and others has been to include the following two pragmas:
But the following code produces garbage output on STDOUT unless I comment out both pragmas!use utf8; # expect UTF-8 text in this source code use open qw(:std :encoding(UTF-8)); # do UTF-8 encoding on I/O and STD +*
The garbage I sometimes see is: Åke Lindström. If I paste and pipe that into `hexdump` I get
00000000 c3 83 c2 85 6b 65 20 4c 69 6e 64 73 74 72 c3 83 |....ke +Lindstr..| 00000010 c2 b6 6d |..m|
Other possibly-relevant info: I'm on MacOS Sequoia. I'm working in iTerm2.app, and I get the same behavior in Terminal.app.
I've read perlunitut and https://perldoc.perl.org/open, probably not enough times.
(Please note: I'm having trouble using UTF-8 text in this post, so unfortunately it's not going to look right. The text I'm trying to use, shown as a hex string, is "c3856b65204c696e64737472c3b66d". I fervently hope that, even without working code, someone can identify what I'm doing wrong.)
#!/usr/bin/env perl use v5.40; # brian d foy recommends using these settings # (https://stackoverflow.com/a/47946606/522385): # (1) recognize UTF-8 in this source code: use utf8; # (2) do the right things for writing and reading UTF-8, including to +STD*: use open qw(:std :encoding(UTF-8)); my $utf8_text1 = "Åke Lindström"; # contains UTF8 chars: say "A variable set to a UTF8 literal within perl program"; show ($utf8_text1); use DBI; my $dbh = DBI->connect( "dbi:SQLite:dbname=:memory:", "", "", { RaiseError => 1, AutoCommit => 1 } ); $dbh->do('CREATE TABLE names (name_id CHAR PRIMARY KEY, name CHAR)'); $dbh->do(qq{INSERT INTO names VALUES("nm0512537", "$utf8_text1")}); my $aoa_ref = $dbh->selectall_arrayref( q{SELECT name FROM names WHERE name_id="nm0512537"} ); say "\nUTF-8 text stored in, and retrieved from, sqlite DB:"; show($aoa_ref->[0][0]); sub show($str) { say "Binary: ", join ' ', (unpack "H*", $str) =~ m/../g ; say "Text>STDOUT: $str"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: My UTF-8 text isn't surviving I/O as expected
by choroba (Cardinal) on Nov 23, 2024 at 20:40 UTC | |
by ibm1620 (Hermit) on Nov 23, 2024 at 22:35 UTC | |
by choroba (Cardinal) on Nov 23, 2024 at 22:50 UTC | |
|
Re: My UTF-8 text isn't surviving I/O as expected
by cavac (Prior) on Nov 25, 2024 at 13:47 UTC | |
by choroba (Cardinal) on Nov 25, 2024 at 15:24 UTC | |
by ibm1620 (Hermit) on Nov 26, 2024 at 01:56 UTC | |
by cavac (Prior) on Nov 26, 2024 at 07:59 UTC |