in reply to Re: Re: use bytes and length problem
in thread use bytes and length problem

Now this is getting interesting :-), when I ran my code, I got this: (I am using AS 5.8.0, and the testing code for case 4 is at the end of this post).

=========================
Case 1: create string from pack, with use bytes

char semantics: ƐIJ Length = 2, Content = 400.306
byte semantics: ƐIJ Length = 4, Content = 198.144.196.178
198.144.196.178
=========================
Case 2: create string from pack, with use buyes

char semantics: ƐIJ Length = 2, Content = 400.306
byte semantics: ƐIJ Length = 4, Content = 198.144.196.178
400.306
=========================
Case 3: create string from \x{}

char semantics: ƐIJ Length = 2, Content = 400.306
byte semantics: ƐIJ Length = 4, Content = 198.144.196.178
400.306
=========================
Case 4: read string from unicode file

char semantics: 裴佳谷
 Length = 4, Content = 35060.20339.35895.10
byte semantics: 裴佳谷
 Length = 10, Content = 232.163.180.228.189.179.232.176.183.10
Also, I want to add a case to cover the situation where you read your string from file:
{ print "=========================\n"; print "Case 4: read string from utf8 file\n"; open(FILE, "<:utf8", "test.txt"); $encoded_string = <FILE>; display $encoded_string; }

Replies are listed 'Best First'.
Re: Re: Re: Re: use bytes and length problem
by BrowserUk (Patriarch) on Mar 02, 2003 at 23:47 UTC

    These are the results from my tests using 5.6.1. length seems to work fine for me whether use utf8/use bytes was inforce, but someone mentioned that there was a known problem with use bytes in 5.6 earlier in the thread, so I tried unpack with 'C*' which is cited in the docs as explicitely bypassing the unicode stuff.

    #! perl -sw use strict; use LWP::Simple; my $content = get( 'http://www.columbia.edu/kermit/utf8.html' ); { use utf8; my $c_len = length $content; my @c_bytes = unpack 'C*', $content; my @c_chars = unpack 'U*', $content; print "Charwise - length:$c_len; 'C*':", scalar @c_bytes, "; 'U*': +", scalar @c_chars, $/; } { use bytes; my $b_len = length $content; my @b_bytes = unpack 'C*', $content; my @b_chars = unpack 'U*', $content; print "Bytewise - length:$b_len; 'C*':", scalar @b_bytes, "; 'U*': +", scalar @b_chars, $/; } { open JUNK, '>', 'junk' or die $!; binmode(JUNK); print JUNK $content; close JUNK; print 'Actual (from os): ', -s 'junk', $/; } __END__ C:\test>239788 Charwise - length:31946; 'C*':31946; 'U*':28621 Bytewise - length:31946; 'C*':31946; 'U*':28621 Actual (from os): 31946

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: use bytes and length problem
by Notromda (Pilgrim) on Mar 03, 2003 at 01:20 UTC
    I got this too, under RH8