in reply to Re^2: Perl's encoding versus UTF8 octets
in thread Perl's encoding versus UTF8 octets
What is actually stored in your files? The literal text you provided in the string, or something else? If it is the text in the string then you can:
use strict; use warnings; use Encode; binmode STDOUT, ':utf8'; print "Content-Type:text/html; charset=utf-8\n"; print "Content-Language: utf8;\n\n"; my $asText = do {local $/; <DATA>}; $asText =~ s!\\x(..)!chr(hex($1))!ge; my $uCode; my $newcode = decode('utf8', $asText); print "<p>$newcode</p>\n"; __DATA__ \xc3\xa4 <span class="sy">\xc3\xa4</span>, <span class="sy">\xc3\x84</span> <span class="posg pos">Substantiv, Neutrum, das</span> <span class="vg v"> \xc3\x84 \xc9\x9b\xcb\x90 das \xc3\xa4; Genitiv: +des \xc3\xa4 (umgangssprachlich: -s), \xc3\xa4 (umgangssprachlich: -s +) </span>
Prints:
Content-Type:text/html; charset=utf-8 Content-Language: utf8; <p>ä <span class="sy">ä</span>, <span class="sy">Ä</span> <span class="posg pos">Substantiv, Neutrum, das</span> <span class="vg v"> Ä ɛː das ä; Genitiv: des ä (umgangsspra +chlich: -s), ä (umgangssprachlich: -s) </span> </p>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Perl's encoding versus UTF8 octets
by Polyglot (Chaplain) on Jan 13, 2021 at 07:00 UTC |