Re: use bytes and length problem

I would suspect that the problem resides in the way your $txt is created. I wrote up this piece of demo, to show different ways to form your string, and "use bytes" works all the time.

Hope this helps:

use strict;

sub display {
    my $string = shift;
    use utf8;# as you can see from the result, whether to use utf8, or
+ bytes is irrelevant in this demo, as "U*' forces unicode any way 
    print  "\nchar semantics: ";
    print "$string ";
    printf "Length = %d, ", length($string);
    printf "Content = %vd\n", $string;
    use bytes;
    print  "byte semantics: ";
    print "$string ";
    printf "Length = %d, ", length($string);
    printf "Content = %vd\n", $string;
}

my $encoded_string;
my @decoded_list;

{
    use bytes; 
    print "=========================\n";
    print "Case 1: create string from pack, with use bytes\n";
    $encoded_string = pack("U*", 400, 306);
    display $encoded_string;
    @decoded_list = unpack("U*", $encoded_string);
    print join(".", @decoded_list), "\n";
}

{
    use utf8; #not necessary in this case
    print "=========================\n";
    print "Case 2: create string from pack, with use utf8\n";
    $encoded_string = pack("U*", 400, 306);
    display $encoded_string;
    @decoded_list = unpack("U*", $encoded_string);
    print join(".", @decoded_list), "\n";
}

{
    print "=========================\n";
    print "Case 3: create string from \\x{}\n";
    $encoded_string = "\x{190}\x{132}";#hex value of 400 and 306
    display $encoded_string;
    @decoded_list = unpack("U*", $encoded_string);
    print join(".", @decoded_list), "\n";
}
[download]

Comment on Re: use bytes and length problem Download Code

Replies are listed 'Best First'.
Re: Re: use bytes and length problem by Hofmator (Curate) on Mar 02, 2003 at 22:07 UTC
For those of you who are too lazy to run pg's code, here's the output ;-) ========================= Case 1: create string from pack, with use bytes char semantics: ﾆﾄｲ Length = 4, Content = 198.144.196.178 byte semantics: ﾆﾄｲ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 2: create string from pack, with use utf8 char semantics: ﾆﾄｲ Length = 4, Content = 198.144.196.178 byte semantics: ﾆﾄｲ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 3: create string from \x{} char semantics: ﾆﾄｲ Length = 2, Content = 400.306 byte semantics: ﾆﾄｲ Length = 4, Content = 198.144.196.178 400.306 [download] Update I'm on perl 5.6.0 on solaris, so it's probably my own problem ;-). Full spec: Read more... (3 kB) -- Hofmator	[reply] [d/l] [select]
Re: Re: Re: use bytes and length problem by pg (Canon) on Mar 02, 2003 at 22:52 UTC
Now this is getting interesting :-), when I ran my code, I got this: (I am using AS 5.8.0, and the testing code for case 4 is at the end of this post). ========================= Case 1: create string from pack, with use bytes char semantics: ﾆ斉ｲ Length = 2, Content = 400.306 byte semantics: ﾆ斉ｲ Length = 4, Content = 198.144.196.178 198.144.196.178 ========================= Case 2: create string from pack, with use buyes char semantics: ﾆ斉ｲ Length = 2, Content = 400.306 byte semantics: ﾆ斉ｲ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 3: create string from \x{} char semantics: ﾆ斉ｲ Length = 2, Content = 400.306 byte semantics: ﾆ斉ｲ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 4: read string from unicode file char semantics: 陬ｴ菴ｳ隹ｷ Length = 4, Content = 35060.20339.35895.10 byte semantics: 陬ｴ菴ｳ隹ｷ Length = 10, Content = 232.163.180.228.189.179.232.176.183.10 Also, I want to add a case to cover the situation where you read your string from file: `{ print "=========================\n"; print "Case 4: read string from utf8 file\n"; open(FILE, "<:utf8", "test.txt"); $encoded_string = <FILE>; display $encoded_string; }` [download]	[reply] [d/l]
Re: Re: Re: Re: use bytes and length problem by BrowserUk (Patriarch) on Mar 02, 2003 at 23:47 UTC
These are the results from my tests using 5.6.1. length seems to work fine for me whether use utf8/use bytes was inforce, but someone mentioned that there was a known problem with use bytes in 5.6 earlier in the thread, so I tried unpack with 'C' which is cited in the docs as explicitely bypassing the unicode stuff. #! perl -sw use strict; use LWP::Simple; my $content = get( 'http://www.columbia.edu/kermit/utf8.html' ); { use utf8; my $c_len = length $content; my @c_bytes = unpack 'C', $content; my @c_chars = unpack 'U', $content; print "Charwise - length:$c_len; 'C':", scalar @c_bytes, "; 'U': +", scalar @c_chars, $/; } { use bytes; my $b_len = length $content; my @b_bytes = unpack 'C', $content; my @b_chars = unpack 'U', $content; print "Bytewise - length:$b_len; 'C':", scalar @b_bytes, "; 'U': +", scalar @b_chars, $/; } { open JUNK, '>', 'junk' or die $!; binmode(JUNK); print JUNK $content; close JUNK; print 'Actual (from os): ', -s 'junk', $/; } __END__ C:\test>239788 Charwise - length:31946; 'C':31946; 'U':28621 Bytewise - length:31946; 'C':31946; 'U*':28621 Actual (from os): 31946 [download] Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply] [d/l]
Re: use bytes and length problem by Notromda (Pilgrim) on Mar 03, 2003 at 01:20 UTC
I got this too, under RH8	[reply]