in reply to UTF8 Output with XML::Feed?

my guess is that you need to add use utf8;

It tells Perl to treat the source code as utf8 instead of ASCII and this includes literal strings like 'abc...åäö'

See utf8 for more.

update

This

åäö

looks very much like use utf8 is missing.

Without Perl will interpret the multibyte characters as single bytes.

U+00E5    å    c3 a5    LATIN SMALL LETTER A WITH RING ABOVE

hence

Ã¥

in HTML encoding of single bytes.

use utf8

will activate the utf8 flag for variables populated from literal strings, in order to treat multibytes as character strings.

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^2: UTF8 Output with XML::Feed? (use utf8)
by mldvx4 (Hermit) on Mar 07, 2022 at 17:55 UTC

    Thanks. Adding use utf8; was one of the first things I tried. I've also tried opening stdout as :utf8 but that doesn't help either. Adding an additional print() shows that the script itself is handling UTF8, or at least looks like it is, but XML::Feed seems not to.

    #!/usr/bin/perl use utf8; use open ':encoding(utf8)'; use XML::Feed; use English; use strict; use warnings; my $d='Feed from a to ö'; my $t='abc...åäö'; my $feed = XML::Feed->new('RSS'); $feed->title('Feed'); $feed->link('https://www.example.com/feed.rss'); $feed->language('en'); $feed->description($d); my $entry = XML::Feed::Entry->new(); $entry->link('https://www.example.com/one.html'); $entry->title($t); $feed->add_entry($entry); print "Description: $d\n"; print "Title: $t\n"; print $feed->as_xml; exit(0)

    For what it's worth, the following appears to produce only a blank line.

    #!/usr/bin/perl use utf8; print "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n";

    The terminal is xfce4-terminal 0.8.10 (Xfce 4.16) and set to use UTF-8. Pressing the keys "åäö" appear to show the right characters.

      "For what it's worth, the following appears to produce only a blank line."

      Please go back and (re)read the utf8 documentation; paying particular attention to the very clear and emboldened directive:

      Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.

      The code you presented only contains 7-bit ASCII characters.

      You got what appeared to be a blank line. Here are some things you could have tried:

      $ perl -e 'print "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n";' $ perl -e 'print "|\N{LATIN SMALL LETTER A WITH RING ABOVE}|\n";' | | $ perl -C -e 'print "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n";' å $ perl -e 'use open OUT => qw{:encoding(UTF-8) :std}; print "\N{LATIN +SMALL LETTER A WITH RING ABOVE}\n";' å

      See: perlrun for -C; and, the open pragma.

      — Ken

        > The code you presented only contains 7-bit ASCII characters.

        erm ... åäö???

        update

        > Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.

        Unfortunately this line is easily misunderstood. I recently had a long dispute with a camel award winner who read it wrongly.

        Many think it only means you can use unicode characters for identifiers, like $möhre or sub née but it covers also literal strings read thru the same file-handle DATA.

        Please note how the UTF8 flag is set for $t2 (see FLAGS)

        use v5.12; use warnings; use Devel::Peek; my $t1='åäö'; Dump $t1; use utf8; my $t2='åäö'; Dump $t2; my $t3 = "\N{LATIN SMALL LETTER A WITH RING ABOVE}\n"; say $t3; Dump $t3;
        OUTPUT:
        SV = PV(0xd9ae08) at 0x25809b0 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x260a4e8 "\303\245\303\244\303\266"\0 CUR = 6 LEN = 10 COW_REFCNT = 1 SV = PV(0xd9add8) at 0x2580248 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x260a068 "\303\245\303\244\303\266"\0 [UTF8 "\x{e5}\x{e4}\x{f6 +}"] CUR = 6 LEN = 10 COW_REFCNT = 1 SV = PV(0xd9afe8) at 0x2580a40 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x2767378 "\303\245\n"\0 [UTF8 "\x{e5}\n"] CUR = 3 LEN = 10 COW_REFCNT = 1 å

        UPDATE2

        extended the code with $t3, which doesn't print an empty line for me but å

        UPDATE3

        of course, how the print is displayed depends also on the output channel and the display settings.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

      I can't comment on XML::Feed, sorry.

      But ...

      > Adding use utf8; was one of the first things I tried.

      ... if your source-code is in utf8 (check your editor settings) and you have a line like my $t='abc...åäö'; you must apply use utf8;

      Otherwise Perl will not know how to decode the bytes in that string, because the interpretation is not obvious.

      You should clarify this, before meddling with XML.

      Here a demo you should run:

      use v5.12; use warnings; use Data::Dump; my $t1='åäö'; ddx $t1; say "length: ",length $t1; use utf8; my $t2='åäö'; ddx $t2; say "length: ",length $t2;
      OUTPUT:
      # demo_utf8.pl:8: "\xC3\xA5\xC3\xA4\xC3\xB6" <-- bytes length: 6 # demo_utf8.pl:14: "\xE5\xE4\xF6" <-- code p +oints length: 3

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery