in reply to Re^2: Perl UTF-8 serving HTML5
in thread Perl UTF-8 serving HTML5

If you are on a Windows system, the byte count discrepancy might be because Perl represents line breaks internally as line feed characters.

That was it. I just count the line breaks on windows and added it to the overall count and it works.
Thanks.

Replies are listed 'Best First'.
Re^4: Perl UTF-8 serving HTML5
by Anonymous Monk on May 29, 2016 at 21:15 UTC
    Or you could turn it off the :crlf layer  binmode STDOUT, ':raw:encoding(UTF-8)';

      The binmode STDOUT, ':raw:encoding(UTF-8)'; is kinda close, but the count is off by 1. Adding 1 and it works. I am using Perl 5.22.1 if that helps.

      Update: Now that I reviewed the numbers from counting line breaks and binmode STDOUT, ':raw:encoding(UTF-8)';. Counting line breaks will give you a much lager number, that is why it works. Where using binmode STDOUT, ':raw:encoding(UTF-8)'; and adding one to it. Then its just right.

      What kind of portability issues will I face when using something like this on Linux?

        What kind of portability issues will I face when using something like this on Linux?

        linux/windows/banana its all the same

        count will always be wrong if you're counting characters, then afterwards doing conversion/encoding through binmode

        utf8 is variable length encoding, some characters encode as 1 byte, others as 3, so counting characters to know the number of bytes that will result will never work except by accident , when all the characters are basic ascii/latin text anyway

        Its like counting your chickens before the eggs hatch -- not all eggs will hatch, some will be twins, fox will eat four,

        only real solution is to encode before counting (get the bytes, hatch the eggs, then count chickens)