in reply to Re^4: Size and anatomy of an HTTP response
in thread Size and anatomy of an HTTP response
Regardless of the conventions of your tools, you can be assured of always getting the byte count and only the byte count by turning the utf8 flag off. So if you are uncertain:
#copy so we don't muck utf8 flag on original string my $sTmp=$someData; Encode::_utf8_off($sTmp); my $iLength = length($sTmp);
or for future use you could just wrap this up in a sub:
sub countBytes { my $s=$_[0]; #makes copy Encode::_utf8_off($s); return length($s); } # or to save memory by avoiding a copy for loooong strings # BUT note: may not be a good idea if string is shared by multiple # threads since this is not atomic and another thread could grab # control while the utf8 bit is temporarily off. # The copy approach is more stable and thread friendly. sub countBytes { my $bUtf = Encode::is_utf8($_[0]); Encode::_utf8_off($_[0]); my $i=length($_[0]); Encode::_utf8_on($_[0]) if $bUtf; return $i; } #calc bytes before printf to show flag is indeed preserved my $s=chr(0x160); my $iBytes = countBytes($s); printf "chr=<%s> utf8-flag=%s length=%d bytes=%s\n" , $s, Encode::is_utf8($s)?'yes':'no', length($s), $iBytes; #outputs: chr=<?> utf8-flag=yes length=1 bytes=2
Best of luck with your project.
Update: added memory friendly, thread unfriendly version of countBytes()
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Size and anatomy of an HTTP response
by Discipulus (Canon) on Dec 16, 2010 at 12:49 UTC |