in reply to Re: use bytes and length problem
in thread use bytes and length problem

For those of you who are too lazy to run pg's code, here's the output ;-)
========================= Case 1: create string from pack, with use bytes char semantics: IJ Length = 4, Content = 198.144.196.178 byte semantics: IJ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 2: create string from pack, with use utf8 char semantics: IJ Length = 4, Content = 198.144.196.178 byte semantics: IJ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 3: create string from \x{} char semantics: IJ Length = 2, Content = 400.306 byte semantics: IJ Length = 4, Content = 198.144.196.178 400.306

Update I'm on perl 5.6.0 on solaris, so it's probably my own problem ;-). Full spec:

> perl -V Summary of my perl5 (revision 5.0 version 6 subversion 0) configuratio +n: Platform: osname=solaris, osvers=2.6, archname=sun4-solaris uname='sunos fluidy 5.6 generic_105181-23 sun4d sparc sunw,sparcse +rver-1000 ' config_args='-Dcc=gcc -Dprefix=/opt/local/gnu' hint=previous, useposix=true, d_sigaction=define usethreads=undef use5005threads=undef useithreads=undef usemultipl +icity=undef useperlio=undef d_sfio=undef uselargefiles=define use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=u +ndef Compiler: cc='gcc', optimize='-O2', gccversion=2.95.2 19991024 (release) cppflags='-fno-strict-aliasing -I/opt/local/include -I/opt/local/g +nu/include -I/opt/local/X11/include -D_LARGEFILE_SOURCE -D_FILE_OFFSE +T_BITS=64' ccflags ='-fno-strict-aliasing -I/opt/local/include -I/opt/local/g +nu/include -I/opt/local/X11/include -D_LARGEFILE_SOURCE -D_FILE_OFFSE +T_BITS=64' stdchar='unsigned char', d_stdstdio=define, usevfork=false intsize=4, longsize=4, ptrsize=4, doublesize=8 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=1 +6 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', + lseeksize=8 alignbytes=8, usemymalloc=y, prototype=define Linker and Libraries: ld='gcc', ldflags ='-L/opt/local/lib -L/opt/local/gnu/lib -L/opt/l +ocal/X11/lib ' libpth=/usr/lib /usr/ccs/lib /opt/local/lib /opt/local/gnu/lib /op +t/local/X11/lib libs=-lsocket -lnsl -ldb -ldl -lm -lc -lcrypt -lsec libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' -Wl,-E' cccdlflags='-fPIC', lddlflags=' -W,l-E -G -L/opt/local/lib -L/opt/ +local/gnu/lib -L/opt/local/X11/lib' Characteristics of this binary (from libperl): Compile-time options: USE_LARGE_FILES Built under solaris Compiled at Dec 21 2000 19:25:43

-- Hofmator

Replies are listed 'Best First'.
Re: Re: Re: use bytes and length problem
by pg (Canon) on Mar 02, 2003 at 22:52 UTC
    Now this is getting interesting :-), when I ran my code, I got this: (I am using AS 5.8.0, and the testing code for case 4 is at the end of this post).

    =========================
    Case 1: create string from pack, with use bytes
    
    char semantics: ƐIJ Length = 2, Content = 400.306
    byte semantics: ƐIJ Length = 4, Content = 198.144.196.178
    198.144.196.178
    =========================
    Case 2: create string from pack, with use buyes
    
    char semantics: ƐIJ Length = 2, Content = 400.306
    byte semantics: ƐIJ Length = 4, Content = 198.144.196.178
    400.306
    =========================
    Case 3: create string from \x{}
    
    char semantics: ƐIJ Length = 2, Content = 400.306
    byte semantics: ƐIJ Length = 4, Content = 198.144.196.178
    400.306
    =========================
    Case 4: read string from unicode file
    
    char semantics: 裴佳谷
     Length = 4, Content = 35060.20339.35895.10
    byte semantics: 裴佳谷
     Length = 10, Content = 232.163.180.228.189.179.232.176.183.10
    
    Also, I want to add a case to cover the situation where you read your string from file:
    { print "=========================\n"; print "Case 4: read string from utf8 file\n"; open(FILE, "<:utf8", "test.txt"); $encoded_string = <FILE>; display $encoded_string; }

      These are the results from my tests using 5.6.1. length seems to work fine for me whether use utf8/use bytes was inforce, but someone mentioned that there was a known problem with use bytes in 5.6 earlier in the thread, so I tried unpack with 'C*' which is cited in the docs as explicitely bypassing the unicode stuff.

      #! perl -sw use strict; use LWP::Simple; my $content = get( 'http://www.columbia.edu/kermit/utf8.html' ); { use utf8; my $c_len = length $content; my @c_bytes = unpack 'C*', $content; my @c_chars = unpack 'U*', $content; print "Charwise - length:$c_len; 'C*':", scalar @c_bytes, "; 'U*': +", scalar @c_chars, $/; } { use bytes; my $b_len = length $content; my @b_bytes = unpack 'C*', $content; my @b_chars = unpack 'U*', $content; print "Bytewise - length:$b_len; 'C*':", scalar @b_bytes, "; 'U*': +", scalar @b_chars, $/; } { open JUNK, '>', 'junk' or die $!; binmode(JUNK); print JUNK $content; close JUNK; print 'Actual (from os): ', -s 'junk', $/; } __END__ C:\test>239788 Charwise - length:31946; 'C*':31946; 'U*':28621 Bytewise - length:31946; 'C*':31946; 'U*':28621 Actual (from os): 31946

      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.
      I got this too, under RH8