comment on

These are the results from my tests using 5.6.1. length seems to work fine for me whether use utf8/use bytes was inforce, but someone mentioned that there was a known problem with use bytes in 5.6 earlier in the thread, so I tried unpack with 'C*' which is cited in the docs as explicitely bypassing the unicode stuff.

#! perl -sw
use strict;
use LWP::Simple;
my $content = get( 'http://www.columbia.edu/kermit/utf8.html' );

{
    use utf8;
    my $c_len = length $content;
    my @c_bytes = unpack 'C*', $content;
    my @c_chars = unpack 'U*', $content;
    print "Charwise - length:$c_len; 'C*':", scalar @c_bytes, "; 'U*':
+", scalar @c_chars, $/;
}
{
    use bytes;
    my $b_len = length $content;
    my @b_bytes = unpack 'C*', $content;
    my @b_chars = unpack 'U*', $content;
    print "Bytewise - length:$b_len; 'C*':", scalar @b_bytes, "; 'U*':
+", scalar @b_chars, $/;
}
{
    open JUNK, '>', 'junk' or die $!;
    binmode(JUNK);
    print JUNK $content;
    close JUNK;
    print 'Actual (from os): ', -s 'junk', $/;
}
__END__
C:\test>239788
Charwise - length:31946; 'C*':31946; 'U*':28621
Bytewise - length:31946; 'C*':31946; 'U*':28621
Actual (from os): 31946
[download]

Examine what is said, not who speaks.

1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

In reply to Re: Re: Re: Re: use bytes and length problem by BrowserUk
in thread use bytes and length problem by muad33b

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.