Bod has asked for the wisdom of the Perl Monks concerning the following question:

I realise that we've had a very long recent thread about Unicode...sorry if this drags the issue out further! However, I have little understanding of Unicode, and I rarely needed to know. But I could do with some help, please...

My partner runs a dog care business and I have built the booking platform for her. Part of this provides a URL to Google Calendar to update our mobile calendars from the booking system. This all works fine but I'll ask a couple of more general questions at the end.

I decided it would be nice to have a Dog Face Emoji as the first character of the title of the calendar entry. But I cannot get this to display. The script uses Template to generate the ICS feed for Google Calendar.

BEGIN:VCALENDAR VERSION:2.0 PRODID:Pawsies Calendar 1.0//EN CALSCALE:GREGORIAN METHOD:PUBLISH [% FOREACH event IN day %]BEGIN:VEVENT SUMMARY:\x{e052} [% event.type %][% IF event.dog %] - [% event.dog %][ +% END %] [% IF event.note %]DESCRIPTION: [% event.note %] [% END %]UID:pawsies[% event.idBooking %][% event.id %]@pawsies.uk SEQUENCE:[% event.sequence %] DTSTAMP:[% event.dtstart %] DTSTART:[% event.dtstart %] DTEND:[% event.dtend %] URL:https://www.pawsies.uk/admin/calendar/?day=[% event.date %] COLOR:[% event.color %] END:VEVENT [% END %]END:VCALENDAR

Everything works as expected except the emoji is printed as a literal \x{e052} instead of a 🐶 - I have use utf8; at the top of the script and the HTTP Header is:

Content-type: text/calendar; charset=utf-8

A couple of extra questions if you are experienced at feeding data to Google Calendar:
1 - Is it possible to force Google to refresh the feed? Waiting for 24 hours or so makes debugging slow and tedious.
2 - Is it possible to set the colour of the event from the ICS feed so we can have multiple colours from one feed? Currently, we have two feeds just to get two different colours. I've tried the COLOR property but it seems to be ignored.

Updated to correct MIME type

Replies are listed 'Best First'.
Re: Another Unicode/emoji question
by kcott (Archbishop) on Dec 22, 2023 at 06:57 UTC

    G'day Bod,

    I don't know where you got \x{e052} from. That's a codepoint in Unicode PDF Code Chart "Private Use Area (Range: E000-F8FF)". What you want is U+01F436 which is in Unicode PDF Code Chart "Miscellaneous Symbols and Pictographs (Range: 1F300-1F5FF)".

    There's a number of ways to generate that character with Perl:

    $ perl -E '
        use strict;
        use warnings;
        use utf8;
        use open OUT => qw{:encoding(UTF-8) :std};
        say q{\x{1f436} = }, "\x{1f436}";
        say q{\x{1F436} = }, "\x{1F436}";
        say q{\N{DOG FACE} = }, "\N{DOG FACE}";
        say q{🐶 = }, "🐶";
    '
    \x{1f436} = 🐶
    \x{1F436} = 🐶
    \N{DOG FACE} = 🐶
    🐶 = 🐶
    

    In HTML, you can use the entities 🐶 (renders as: 🐶) or 🐶 (renders as: 🐶).

    There's potentially other ways to achieve this that I haven't immediately thought of.

    — Ken

      "There's potentially other ways to achieve this that I haven't immediately thought of."
      $ alias perlu alias perlu='perl -Mstrict -Mwarnings -Mautodie=:all -Mutf8 -C -E'
      $ perlu 'say chr 0x1f436; say chr 128054;'
      🐶
      🐶
      

      — Ken

Re: Another Unicode/emoji question
by cavac (Prior) on Dec 22, 2023 at 08:41 UTC

    Aside from the answer about the correct Unicode code point by kcott, it seems some calendar systems are more forgiving than others when it comes to encoding.

    If Unicode still makes problems, check (with a hex editor) if the file you generate has a byte order mark. I haven't played with ICS calender stuff in many, MANY years, so i'm just guessing. But at least a quick google suggest that a BOM might be required.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      If Unicode still makes problems, check (with a hex editor) if the file you generate has a byte order mark

      Even after applying the encoding from Re^3: Another Unicode/emoji question, it is not displaying, so I need to try BOM next...

      Any suggestions on the best way to add the BOM? File::BOM and String::BOM appear only to read the files, not generate them. Or is it as simple as writing \x{efbbbf} as the first thing after the HTTP headers?

        Or is it as simple as writing \x{efbbbf} as the first thing after the HTTP headers?

        The string my $str = "\x{efbbbf}"; does not contain the BOM character, it contains U+EFBBBF, which is not valid a valid Unicode character (AFAIK: I believe Unicode only goes to U+1FFFFF U+10FFFF). The string my $str = "\x{feff}"; contains the BOM character.

        If you did use the string you suggested, whether with raw mode or with UTF-8 output encoding, you will not get what you thought:

        C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\x{efbbbf})" + | xxd Wide character in print at -e line 1. 00000000: f8bb bbae bf ..... C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print +qq(\x{efbbbf})" | xxd Code point 0xEFBBBF is not Unicode, may not be portable in print at -e + line 1. 00000000: 5c78 7b45 4642 4242 467d \x{EFBBBF}

        Neither of those outputs the UTF-8 bytes for the BOM U+FEFF character.

        Instead, you either need to manually send the three octets separately in raw mode, or use raw mode and manually encode from a perl string into UTF-8 bytes, or use UTF-8 output encoding and send the U+FEFF character from the string directly:

        C:\Users\Peter> perl -e "binmode STDOUT, ':raw'; print qq(\xef\xbb\xbf +)" | xxd 00000000: efbb bf ... C:\Users\Peter> perl -MEncode -e "binmode STDOUT, ':raw'; print Encode +::encode('UTF-8', qq(\x{feff}));" | xxd 00000000: efbb bf ... C:\Users\Peter> perl -e "use open ':std' => ':encoding(UTF-8)'; print +qq(\x{feff})" | xxd 00000000: efbb bf ...

        Whether or not that would "work" in your use-case is something I don't know: my guess is that it won't help, because anything that's using HTTP headers should be paying attention to the encoding listed in the headers, and not requiring a BOM in the message body. Though I guess if it's saving the HTTP message body into a file, and then later using that file, maybe the BOM would help. I don't know on that, sorry.

        --
        warning: Windows quoting used in code blocks; swap quotes around if you're on linux

      If Unicode still makes problems, check (with a hex editor) if the file you generate has a byte order mark

      Thanks. My URL doesn't have a BOM and I hadn't considered the possibility that it might need one!

      I have changed the code as suggested by kcott and I'll now wait until tomorrow for Google to hit my calendar endpoint and see if works. If not, another day and I'll look into whether a BOM is the issue. Debugging this is taking forever as I have to wait for Google to refresh which is does approximately once per day.

      It looks like a BOM is the next thing to try...

      Google has updated the calendar and displayed \x{1f436} instead of 🐶

Re: Another Unicode/emoji question
by ikegami (Patriarch) on Dec 22, 2023 at 05:37 UTC

    How do you generate and encode the string from the template?

      A bit like this...

      #!/usr/bin/perl use CGI::Carp qw(fatalsToBrowser); use strict; use warnings; use lib "$ENV{'DOCUMENT_ROOT'}/../lib"; use Site::Utils; use utf8; use Pawsies; my $pawsies = Pawsies->new; my $template = Template->new(INCLUDE_PATH => $Site::Variables::temp +late_path); ################### # Control Variables # # write to file each time Google calls API my $debug = 1; # # number of days in the past to sync calendar my $sync = 28; # ################### $data{'format'} = 'calendar' unless $data{'format'} eq 'plain'; print "Content-type: text/$data{'fomat'}; charset=utf-8\n\n"; my @day; my $query = $dbh->prepare("SELECT *, DATE_FORMAT(start, '%Y%m%dT%H%i%s +') AS dtstart ...."); $query->execute($sync); while (my $row = $query->fetchrow_hashref) { $row->{'dog'} = dogName($row->{'idBooking'}); push @day, $row; } my $vars = { 'day' => \@day, }; $template->process("admin/google/ian.tt", $vars)or die $template->erro +r;

        You forgot to encode.

        Simple way to fix:

        use open ":std", ":encoding(UTF-8)";
Re: Another Unicode/emoji question
by Anonymous Monk on Jan 15, 2024 at 05:23 UTC

    About not waiting 24 hours, you can trick google by adding a harmless tag to the url.

    Example if your url is: https://example.com/example.ics

    Delete it and add a new one: https://example.com/example.ics#1

    Then for next test, delete it and add: https://example.com/example.ics#2

    And so on. Google will treat them as new URL's to fetch, rather than using cached result

Template and Unicode (was: Re: Another Unicode/emoji question)
by Bod (Parson) on Dec 28, 2023 at 22:09 UTC

    As it is a Festive Break 🎄 I've had the opportunity to test this calendar import and find out what is really going on...

    I created a simple test script to generate a single calendar entry from noon to 2pm tomorrow. The day after the ICS feed is accessed.

    #!/usr/bin/perl use CGI::Carp qw(fatalsToBrowser); use strict; use warnings; use lib "$ENV{'DOCUMENT_ROOT'}/../lib"; use open ":std", ":encoding(UTF-8)"; use Site::Utils; my $template = Template->new(INCLUDE_PATH => $Site::Variables::temp +late_path); $data{'format'} = 'calendar' unless $data{'format'} eq 'plain'; print "Content-type: text/$data{'fomat'}; charset=utf-8;\n\n"; #print "\x{feff}"; # BOM my ($date, $uid) = $dbh->selectrow_array("SELECT DATE_FORMAT(NOW() + I +NTERVAL 1 DAY, '%Y%m%d'), DATE_FORMAT(NOW(), '%Y-%j-%H%i%s')"); if ($data{'template'}) { $template->process("admin/google/dogface.tt", $vars)or die $templa +te->error; exit; } print<<"END"; BEGIN:VCALENDAR VERSION:2.0 PRODID:Pawsies Calendar 1.0//EN CALSCALE:GREGORIAN METHOD:PUBLISH BEGIN:VEVENT SUMMARY:\x{1f436} Dog Face Test UID:DFT$uid\@pawsies.uk SEQUENCE:1 DTSTAMP:${date}T120000 DTSTART:${date}T120000 DTEND:${date}T140000 END:VEVENT END:VCALENDAR END

    The module Site::Utils provides the database handle $dbh and splits the HTTP query string and puts it into %data.

    If the BOM is included, Google Calendar doesn't display the entry at all. With the BOM omitted, the entry is displayed.

    But, if we print the ICS data directly from the script, the 🐶 emoji is displayed correctly. If we use Template to handle the printing, instead of 🐶 we get the literal \x{1f436}... So, it appears to be Template that is not printing the Unicode characters.

    Try it here:
    Printing from script
    Printing with Template

    Of course, knowing where the problem exists is different to being able to solve it...

    Do you have any experience of printing Unicode using Template?

      It was several years in the past, but I have used Template with unicode a lot. You can have UTF-8 encoded templates and UTF-8 strings in variables - you just need to declare them consistently.

      The very first configuration parameter documented in Template::Manual::Config is ENCODING. I'll quote it here because it is so short:

      ENCODING

      The ENCODING option specifies the template files' character encoding:

      my $template = Template->new({ ENCODING => 'utf8', });

      A template which starts with a Unicode byte order mark (BOM) will have its encoding detected automatically.

      So, the following program works as I'd expect:

      use Template; use open ":std", ":encoding(UTF-8)"; my $dogface = "\N{DOG FACE}"; my $template = <<"ENDOFTEMPLATE"; Dog face from template: $dogface Dog face from variable: [% dogface %] ENDOFTEMPLATE my $tt = Template->new(ENCODING => 'UTF-8'); $tt->process(\$template,{dogface => $dogface});

      The automatic BOM detection is a handy band-aid if you have a mixture of Latin1 and UTF-8 encoded templates and don't want to re-code them.

      Template is usually an HTML document, not a Perl source code, so Perl escapes don't work there. HTML entities do, though. You can always pass a constant from Perl to a template, too.
      #!/usr/bin/perl use warnings; use strict; use open OUT => ':encoding(UTF-8)', ':std'; use Template; 'Template'->new->process(\'SUMMARY:[% chr(128054) %] [% dog %] &#x1f43 +6; Dog Face Test', {dog => "\x{1f436}", chr => \&CORE::chr});

      Update: Added the chr sub.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]