$h4X4_|=73}{ has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get Apache, Perl, HTML5 and UTF-8 to all work together in a web page.
What is happening is I can see the HTML5 source code and the browser chrome thinks the encoding is utf-8.
At this point I can not tell if its the server that could need an update or
I'm just doing it wrong. Also the Content-Length $byte_count never comes out correct.
The test code.

#!perl $rendered = <<HTML; <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Title of the document</title> </head> <body> <p>Test</p> </body> </html> HTML require Encode; $rendered = Encode::decode(utf8 =>$rendered); my $byte_count = length $rendered; print <<HTML; Content-Length: $byte_count Content-Type: text/plain; charset=utf-8 HTML binmode STDOUT, ":encoding(utf8)"; print $rendered; exit;

Update: I got it to work. Viewing the source problem was the header text/plain, I needed text/html. The "Content-Length" issues was

the fact I was on windows.

#!perl my $smiley = chr(0x263a); #"\x{263a}" my $rendered = <<HTML; <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>Title of the document</title> </head> <body> <p>Test HTML my $rendered2 = <<HTML; </p> </body> </html> HTML # embed a Unicode character :) $rendered .= $smiley; # render footer $rendered .= $rendered2; use Encode qw{encode}; # [id://1164454] binmode STDOUT, ":raw"; $rendered = encode(utf8 =>$rendered); my $byte_count = length($rendered); print <<HTML; Content-Length: $byte_count Content-Type: text/html; charset=utf-8 HTML print $rendered; exit;
Its still so new to me I have a feeling there is something not right...

Replies are listed 'Best First'.
Re: Perl UTF-8 serving HTML5
by davido (Cardinal) on May 29, 2016 at 17:50 UTC

    What does your httpd.conf look like?


    Dave

      Luckily the server was configured correctly. I had the header as text/plain.
      Thanks anyway. ++

Re: Perl UTF-8 serving HTML5
by Anonymous Monk on May 29, 2016 at 19:06 UTC

    Also the Content-Length $byte_count never comes out correct.

    You're taking the length of a string (unicode string, decoded), not of bytes/octets (utf encoded)

    Try

    binmode STDOUT, ':raw'; $rendered = Encode::encode(utf8 =>$rendered); $byte_count = length $rendered; print <<HTML; Content-Length: $byte_count Content-Type: text/plain; charset=utf-8 HTML exit 0;

      The binmode STDOUT, ':raw' is because you have already encoded your data. If you put STDOUT in UTF-8 mode, it will end up double-encoded.

      If you are on a Windows system, the byte count discrepancy might be because Perl represents line breaks internally as line feed characters. When it does output it converts the line breaks to native format. If run on Windows, this adds one character per line. Putting the handle into ':raw' mode prevents this.

        If you are on a Windows system, the byte count discrepancy might be because Perl represents line breaks internally as line feed characters.

        That was it. I just count the line breaks on windows and added it to the overall count and it works.
        Thanks.