in reply to Another Unicode/emoji question

As it is a Festive Break 🎄 I've had the opportunity to test this calendar import and find out what is really going on...

I created a simple test script to generate a single calendar entry from noon to 2pm tomorrow. The day after the ICS feed is accessed.

#!/usr/bin/perl use CGI::Carp qw(fatalsToBrowser); use strict; use warnings; use lib "$ENV{'DOCUMENT_ROOT'}/../lib"; use open ":std", ":encoding(UTF-8)"; use Site::Utils; my $template = Template->new(INCLUDE_PATH => $Site::Variables::temp +late_path); $data{'format'} = 'calendar' unless $data{'format'} eq 'plain'; print "Content-type: text/$data{'fomat'}; charset=utf-8;\n\n"; #print "\x{feff}"; # BOM my ($date, $uid) = $dbh->selectrow_array("SELECT DATE_FORMAT(NOW() + I +NTERVAL 1 DAY, '%Y%m%d'), DATE_FORMAT(NOW(), '%Y-%j-%H%i%s')"); if ($data{'template'}) { $template->process("admin/google/dogface.tt", $vars)or die $templa +te->error; exit; } print<<"END"; BEGIN:VCALENDAR VERSION:2.0 PRODID:Pawsies Calendar 1.0//EN CALSCALE:GREGORIAN METHOD:PUBLISH BEGIN:VEVENT SUMMARY:\x{1f436} Dog Face Test UID:DFT$uid\@pawsies.uk SEQUENCE:1 DTSTAMP:${date}T120000 DTSTART:${date}T120000 DTEND:${date}T140000 END:VEVENT END:VCALENDAR END

The module Site::Utils provides the database handle $dbh and splits the HTTP query string and puts it into %data.

If the BOM is included, Google Calendar doesn't display the entry at all. With the BOM omitted, the entry is displayed.

But, if we print the ICS data directly from the script, the 🐶 emoji is displayed correctly. If we use Template to handle the printing, instead of 🐶 we get the literal \x{1f436}... So, it appears to be Template that is not printing the Unicode characters.

Try it here:
Printing from script
Printing with Template

Of course, knowing where the problem exists is different to being able to solve it...

Do you have any experience of printing Unicode using Template?

Replies are listed 'Best First'.
Re: Template and Unicode (was: Re: Another Unicode/emoji question)
by haj (Vicar) on Dec 28, 2023 at 23:31 UTC

    It was several years in the past, but I have used Template with unicode a lot. You can have UTF-8 encoded templates and UTF-8 strings in variables - you just need to declare them consistently.

    The very first configuration parameter documented in Template::Manual::Config is ENCODING. I'll quote it here because it is so short:

    ENCODING

    The ENCODING option specifies the template files' character encoding:

    my $template = Template->new({ ENCODING => 'utf8', });

    A template which starts with a Unicode byte order mark (BOM) will have its encoding detected automatically.

    So, the following program works as I'd expect:

    use Template; use open ":std", ":encoding(UTF-8)"; my $dogface = "\N{DOG FACE}"; my $template = <<"ENDOFTEMPLATE"; Dog face from template: $dogface Dog face from variable: [% dogface %] ENDOFTEMPLATE my $tt = Template->new(ENCODING => 'UTF-8'); $tt->process(\$template,{dogface => $dogface});

    The automatic BOM detection is a handy band-aid if you have a mixture of Latin1 and UTF-8 encoded templates and don't want to re-code them.

Re: Template and Unicode (was: Re: Another Unicode/emoji question)
by choroba (Cardinal) on Dec 28, 2023 at 22:42 UTC
    Template is usually an HTML document, not a Perl source code, so Perl escapes don't work there. HTML entities do, though. You can always pass a constant from Perl to a template, too.
    #!/usr/bin/perl use warnings; use strict; use open OUT => ':encoding(UTF-8)', ':std'; use Template; 'Template'->new->process(\'SUMMARY:[% chr(128054) %] [% dog %] &#x1f43 +6; Dog Face Test', {dog => "\x{1f436}", chr => \&CORE::chr});

    Update: Added the chr sub.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]