Re^8: Determining content-length for an HTTP Post

Replies are listed 'Best First'.
Re^9: Determining content-length for an HTTP Post by WizardOfUz (Friar) on Nov 26, 2009 at 20:46 UTC
Well, maybe it's the language barrier. My point is that it is simply not possible to encode `$xmldata` without knowing from what / to what. The OP told us nothing about the content of `$xmldata` or the desired encoding. Therefore, the only (quick) advice I could give him was to make sure that `length()` treats `$xmldata` as a series of bytes. And that is exactly what the `bytes` pragma is for. When the `bytes` pragma is in effect, `length()` returns the number of bytes taken by Perl's internal string representation. Which is exactly what we need to know for the `Content-Length` header (assuming that no PerlIO layer has been specified for the outstream): "A user of Perl does not normally need to know nor care how Perl happens to encode its internal strings, but it becomes relevant when outputting Unicode strings to a stream without a PerlIO layer -- one with the "default" encoding. In such a case, the raw bytes used internally (the native character set or UTF-8, as appropriate for each string) will be used, and a "Wide character" warning will be issued if those strings contain a character beyond 0x00FF." (From the perluniintro, emphasis mine) The examples in your previous post were certainly interesting, but missed the point, especially the third one, because the only thing we really need to know for the `Content-Length` header is how many bytes are going to be sent. See above. Furthermore, it is simply not true that the `bytes` pragma is as unreliable as you depicted it. It only fails (in this context) if you try really hard. See my examples above. And yes, I'm aware that if my advice had solved the wrong `Content-Length` problem, the follow-up question would probably have been: "Help! My message content is garbled!". That would have been your opportunity to shine ... Peace.	[reply] [d/l] [select]
Re^10: Determining content-length for an HTTP Post by ikegami (Patriarch) on Nov 26, 2009 at 22:43 UTC
Well, maybe it's the language barrier. My point is that it is simply not possible to encode $xmldata without knowing from what / to what. Correct, just like you can't use `use bytes;` to encode strings. If you revisit what I said, you'll notice I said he needed to encode as per the encoding specified in the `<?xml?>` directive. (UTF-8 is the default, btw.) Therefore, the only (quick) advice I could give him was to make sure that length() treats $xmldata as a series of bytes. `use bytes;` does no such thing. When the bytes pragma is in effect, length() returns the number of bytes taken by Perl's internal string representation. Yes, but we don't want or need that. We want the number of bytes in the string. the only thing we really need to know for the Content-Length header is how many bytes are going to be sent. See above. Even if I can't convince you that `use bytes;` is bad in general, I can clearly show that it doesn't give us the information you just said we needed. `$ perl -E' my $buf = ""; { open my $fh, ">", \$buf; utf8::upgrade( my $all_255_bytes = join "", map chr, 0..255 ); say length $all_255_bytes; say do { use bytes; length $all_255_bytes }; print $fh $all_255_bytes; } say length($buf); ' 256 length without use bytes 384 length with use bytes 256 actual content length` [download] Given a string of bytes, `length` without `use bytes;` always gives the number of bytes. `length` with `use bytes;` doesn't always give the number of bytes. Given a string of chars, `length` without `use bytes;` always gives the number of chars. `length` with `use bytes;` doesn't always give the number of chars. `length` with `use bytes;` doesn't always give the bytes of the UTF-8 encoding of the chars either. Furthermore, it is simply not true that the bytes pragma is as unreliable as you depicted it. It only fails (in this context) if you try really hard. See my examples above. Compared to not using `use bytes;` which always returns the right value? Yes, it is. I'm aware that if my advice had solved the wrong Content-Length problem, the follow-up question would probably have been: "Help! My message content is garbled!". That would have been your opportunity to shine ... Something can be wrong and still work. Bad code sometimes works.	[reply] [d/l] [select]
Re^11: Determining content-length for an HTTP Post by WizardOfUz (Friar) on Nov 27, 2009 at 12:01 UTC
Given a string of bytes, ... length with `use bytes;` doesn't always give the number of bytes. You are spreading (dangerous) FUD, no offence meant (really!). The `bytes` pragma works as advertised. The problem with your example is that you are fiddling with the internal UTF-8 flag. The perlunicode document clearly calls `utf8::upgrade()` a "low-level" function. The right way to UTF-8-encode a string is `utf8::encode()` or, maybe even better, the Encode module. See: `joerg@Marvin:~> perl -E' my $buf = ""; { open my $fh, ">", \$buf; utf8::encode( my $all_256_bytes = join "", map chr, 0..255 ); say length $all_256_bytes; say do { use bytes; length $all_256_bytes }; print $fh $all_256_bytes; } say length($buf); ' 384 384 384` [download] Btw, I have already demonstrated this in my previous post. You might want to reread it; example `[6]` is especially interesting. Therefore, the only (quick) advice I could give him was to make sure that `length()` treats `$xmldata` as a series of bytes. `use bytes;` does no such thing. This is taken right from the "bytes" documentation: "Perl normally assumes character semantics in the presence of character data (i.e. data that has come from a source that has been marked as being of a particular character encoding). When `use bytes` is in effect, the encoding is temporarily ignored, and each string is treated as a series of bytes." Maybe you should rethink your statement. We want the number of bytes in the string. No, we (only) need to know how many bytes are going to be sent. Please reread my previous post. Something can be wrong and still work. Bad code sometimes works. Agreed. But, given the limited amount of information we have, my suggestion is still the best first step to take in solving the OP's problem. Peace.	[reply] [d/l] [select]
Re^12: Determining content-length for an HTTP Post by ikegami (Patriarch) on Nov 27, 2009 at 16:22 UTC
Re^13: Determining content-length for an HTTP Post by WizardOfUz (Friar) on Nov 27, 2009 at 18:05 UTC
Some notes below your chosen depth have not been shown here