comment on

Hello again, ikegami. I think we are approaching the OP's problem from different angles, which might be a bit confusing for others. So I would like to clarify a few things; maybe you will join me.

First of all, given the limited amount of information provided by the OP, I was looking at the HTTP level only, ignoring the actual message content.

With that being said, I really think that the best way to determine the Content-Length of a HTTP message if its content cannot be reliably encoded as bytes (we do not know what the OP's $xmldata actually contains) is to use length() with the bytes pragma in effect. This assumes of course that the message content is not being encoded afterwards and that the content string's UTF-8 flag has not been fiddled with.

Some code to play with:

#!/usr/bin/perl

use strict;
use warnings;
use FindBin qw( $Bin );
use File::Spec::Functions qw( catfile );

my $file = catfile( $Bin, 'bytes_pragma.data' );

my $string_1 = "\x{C9}";   # LATIN CAPITAL LETTER E WITH ACUTE; 2 byte
+s in UTF-8
my $string_2 = "\x{20AC}"; # EURO SIGN; 3 bytes in UTF-8

{
    my $string = $string_1;
    print '[1] is_utf8() returns: ' . ( utf8::is_utf8( $string ) ? 'tr
+ue' : 'false' ) . "\n";
    print '[1] length() returns:  ' . length( $string ) . " (no bytes)
+\n";
    print '[1] length() returns:  ' . do { use bytes; length( $string 
+) } . " (use bytes)\n";
    open FH, "> $file" or die;
    print FH $string;
    close FH;
    print '[1] actual size is:    ' . ( -s $file ) . "\n";
}

{
    my $string = $string_1;
    utf8::encode( $string );
    print '[2] is_utf8() returns: ' . ( utf8::is_utf8( $string ) ? 'tr
+ue' : 'false' ) . "\n";
    print '[2] length() returns:  ' . length( $string ) . " (no bytes)
+\n";
    print '[2] length() returns:  ' . do { use bytes; length( $string 
+) } . " (use bytes)\n";
    open FH, "> $file" or die;
    print FH $string;
    close FH;
    print '[2] actual size is:    ' . ( -s $file ) . "\n";
}

{
    my $string = $string_2;
    print '[3] is_utf8() returns: ' . ( utf8::is_utf8( $string ) ? 'tr
+ue' : 'false' ) . "\n";
    print '[3] length() returns:  ' . length( $string ) . " (no bytes)
+\n";
    print '[3] length() returns:  ' . do { use bytes; length( $string 
+) } . " (use bytes)\n";
    open FH, "> $file" or die;
    print FH $string;
    close FH;
    print '[3] actual size is:    ' . ( -s $file ) . "\n";
}

{
    my $string = $string_2;
    utf8::encode( $string );
    print '[4] is_utf8() returns: ' . ( utf8::is_utf8( $string ) ? 'tr
+ue' : 'false' ) . "\n";
    print '[4] length() returns:  ' . length( $string ) . " (no bytes)
+\n";
    print '[4] length() returns:  ' . do { use bytes; length( $string 
+) } . " (use bytes)\n";
    open FH, "> $file" or die;
    print FH $string;
    close FH;
    print '[4] actual size is:    ' . ( -s $file ) . "\n";
}

{

    my $string = $string_1;
    utf8::upgrade( $string );
    print '[5] is_utf8() returns: ' . ( utf8::is_utf8( $string ) ? 'tr
+ue' : 'false' ) . "\n";
    print '[5] length() returns:  ' . length( $string ) . " (no bytes)
+\n";
    print '[5] length() returns:  ' . do { use bytes; length( $string 
+) } . " (use bytes)\n";
    open FH, "> $file" or die;
    print FH $string;
    close FH;
    print '[5] actual size is:    ' . ( -s $file ) . "\n";
}

# but ...
{
    use bytes;
    my $string = $string_1;
    utf8::upgrade( $string );
    print '[6] is_utf8() returns: ' . ( utf8::is_utf8( $string ) ? 'tr
+ue' : 'false' ) . "\n";
    print '[6] length() returns:  ' . length( $string ) . " (use bytes
+)\n";
    open FH, "> $file" or die;
    print FH $string;
    close FH;
    print '[6] actual size is:    ' . ( -s $file ) . "\n";
}
[download]

Output:

joerg@Marvin:~> '/home/joerg/bytes_pragma.pl'
[1] is_utf8() returns: false
[1] length() returns:  1 (no bytes)
[1] length() returns:  1 (use bytes)
[1] actual size is:    1
[2] is_utf8() returns: false
[2] length() returns:  2 (no bytes)
[2] length() returns:  2 (use bytes)
[2] actual size is:    2
[3] is_utf8() returns: true
[3] length() returns:  1 (no bytes)
[3] length() returns:  3 (use bytes)
Wide character in print at /home/joerg/bytes_pragma.pl line 42.
[3] actual size is:    3
[4] is_utf8() returns: false
[4] length() returns:  3 (no bytes)
[4] length() returns:  3 (use bytes)
[4] actual size is:    3
[5] is_utf8() returns: true
[5] length() returns:  1 (no bytes)
[5] length() returns:  2 (use bytes)
[5] actual size is:    1
[6] is_utf8() returns: true
[6] length() returns:  2 (use bytes)
[6] actual size is:    2
[download]

Update: Added a clarification.

In reply to Re^7: Determining content-length for an HTTP Post by WizardOfUz
in thread Determining content-length for an HTTP Post by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.