Re^5: Size and anatomy of an HTTP response

Regardless of the conventions of your tools, you can be assured of always getting the byte count and only the byte count by turning the utf8 flag off. So if you are uncertain:

  #copy so we don't muck utf8 flag on original string
  my $sTmp=$someData;
  Encode::_utf8_off($sTmp);
  my $iLength = length($sTmp);
[download]

or for future use you could just wrap this up in a sub:

sub countBytes {
 my $s=$_[0];   #makes copy
 Encode::_utf8_off($s);
 return length($s);
}

# or to save memory by avoiding a copy for loooong strings
# BUT note: may not be a good idea if string is shared by multiple
# threads since this is not atomic and another thread could grab
# control while the utf8 bit is temporarily off.
# The copy approach is more stable and thread friendly.

sub countBytes {
  my $bUtf = Encode::is_utf8($_[0]);
  Encode::_utf8_off($_[0]);
  my $i=length($_[0]);
  Encode::_utf8_on($_[0]) if $bUtf;
  return $i;
}

#calc bytes before printf to show flag is indeed preserved
my $s=chr(0x160);
my $iBytes = countBytes($s);

printf "chr=<%s> utf8-flag=%s length=%d bytes=%s\n"
  , $s, Encode::is_utf8($s)?'yes':'no', length($s), $iBytes;
#outputs: chr=<?> utf8-flag=yes length=1 bytes=2
[download]

Best of luck with your project.

Update: added memory friendly, thread unfriendly version of countBytes()

Comment on Re^5: Size and anatomy of an HTTP response Select or Download Code

Replies are listed 'Best First'.
Re^6: Size and anatomy of an HTTP response by Discipulus (Canon) on Dec 16, 2010 at 12:49 UTC
ok many many thanks for the patience with me.. for future readers I'will warmly encourage the reading of The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) EDIT: I also found http://perlgeek.de/en/article/encodings-and-unicode Lor* EDIT2: also read a new amazing topic about this at Simplest Possible Way To Disable Unicode EDIT3: also read another topic about encoding Comparing Unicode Greek Characters/Code Points there are no rules, there are no thumbs..	[reply]

Replies are listed 'Best First'.

Re^6: Size and anatomy of an HTTP response
by Discipulus (Canon) on Dec 16, 2010 at 12:49 UTC

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

http://perlgeek.de/en/article/encodings-and-unicode

Simplest Possible Way To Disable Unicode

Comparing Unicode Greek Characters/Code Points

there are no rules, there are no thumbs..

[reply]