in reply to Re^2: Determining content-length for an HTTP Post
in thread Determining content-length for an HTTP Post

use bytes; my $length_in_bytes = length( $xmldata );

Replies are listed 'Best First'.
Re^4: Determining content-length for an HTTP Post
by ikegami (Patriarch) on Nov 25, 2009 at 18:38 UTC
    If the problem is that he forgot to encode his XML, the solution is NOT to get the length of the internal representation of the XML, it's to encode the XML.
    use Encode qw( encode ); # Or whatever encoding you specified in <?xml?> $xmldata = encode('UTF-8', $xmldata); my $length = length( $xmldata );
    or
    utf8::encode( $xmldata ); my $length = length( $xmldata );

      Huh? The bytes pragma simply forces $xmldata to be treated as a series of bytes. This should give us the correct value for the Content-Length header whether $xmldata is a character string or an UTF-8 encoded byte string. Or am I missing something?

        This should give us the correct value for the Content-Length header

        No.

        If the XML is valid, length gives the right answer without use bytes:

        $ perl -le' $_ = "<?xml version=\"1.0\"?><root>\x{C9}ric</root>"; utf8::encode($_); utf8::downgrade($_); print length; print do { use bytes; length }; ' 39 39

        You can get the wrong answer if you use use bytes;:

        $ perl -le' $_ = "<?xml version=\"1.0\"?><root>\x{C9}ric</root>"; utf8::encode($_); utf8::upgrade($_); print length; print do { use bytes; length }; ' 39 41 XXX Should be 39

        If the XML hasn't been encoded, use bytes can give you the right result if the desired encoding is UTF-8, but it's unreliable:

        $ perl -le' $_ = "<?xml version=\"1.0\"?><root>\x{C9}ric</root>"; print do { use bytes; length }; ' 38 XXX Should be 39

        In no case is use bytes; the appropriate answer.

        Perl has two different formats for storing strings. use bytes; causes opcodes to look directly at the internal buffer of the string no matter which format was used. Since Perl is free to change how it internally stores the string at will, it's quite useless to use use bytes; without taking into checking which format Perl used for that string.

        Update: Rephrased for clarity.