use bytes and length problem

muad33b has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: use bytes and length problem by BrowserUk (Patriarch) on Mar 02, 2003 at 00:10 UTC
You might try, `my $size = scalar unpack'C', $txt;` [download] A quick check on some content grabs from UTF websites shows a difference between the counts from that and `my $size = scalar unpack'U', $txt;` [download] using 5.6.1 and 5.8 (AS). Also, the filesize shown by the OS of the same content written in binmode is same as the first above. Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply] [d/l] [select]
Re: Re: use bytes and length problem by muad33b (Acolyte) on Mar 02, 2003 at 15:29 UTC
I read up on pack/unpack, and that looked to be a good way to go, however, those are both returning "60" instead of the "208786" value I was hoping for... I'm going to continue to play with the unpack, see what I come up with... any ideas? Jeff	[reply]
Re: Re: Re: use bytes and length problem by BrowserUk (Patriarch) on Mar 02, 2003 at 16:28 UTC
I tested the method on this page and several of those linked from it and it gives me the correct size every time (as compared to the same data dumped to a file in binmode). I am pretty certain that the problem is that you are using unpack incorrectly. If you would care to post an example of the how you are using it, and a (small) sample of data that you are using it on, then I am certain that we could sort it the problem out. Your reply to pfaut below indicates that you are either trying to use the '/' template character, which will never give you the answers that you require, or you are mixing up the information from different parts of the pack/[and unpack documentation--which would be no surprise as it is probably some of least understandable of the perldocs. Examine what is said, not who speaks. 1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong. 2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible 3) Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke.	[reply]
Re: Re: Re: use bytes and length problem by pfaut (Priest) on Mar 02, 2003 at 15:51 UTC
Pack/Unpack Tutorial (aka How the System Stores Data) might help you figure out how to use pack/unpack. `--- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';` [download]	[reply] [d/l]
Re: Re: Re: use bytes and length problem by muad33b (Acolyte) on Mar 02, 2003 at 16:00 UTC
I played around with unpack, and it appears as if the following line from the pack() perldoc (5.6 and 5.8)page is true: The length-item is not returned explicitly from unpack. Any other ideas? I know I can write this out to a file, but it seems crazy to me to have to do that just to get a proper byte count. Jeff	[reply]
Re: use bytes and length problem by crenz (Priest) on Mar 01, 2003 at 23:28 UTC
Which version of perl are you using? There are some known problems with `use bytes;` in perl 5.6.1. Have you tried 5.8.0?	[reply]
Re: Re: use bytes and length problem by muad33b (Acolyte) on Mar 01, 2003 at 23:31 UTC
I'm using 5.8.0 on Redhat 8.0, and binmode should be in place since it's linux right? I'm looking everywhere... no luck so far.	[reply]
Re: use bytes and length problem by pg (Canon) on Mar 02, 2003 at 21:54 UTC
I would suspect that the problem resides in the way your $txt is created. I wrote up this piece of demo, to show different ways to form your string, and "use bytes" works all the time. Hope this helps: use strict; sub display { my $string = shift; use utf8;# as you can see from the result, whether to use utf8, or + bytes is irrelevant in this demo, as "U' forces unicode any way print "\nchar semantics: "; print "$string "; printf "Length = %d, ", length($string); printf "Content = %vd\n", $string; use bytes; print "byte semantics: "; print "$string "; printf "Length = %d, ", length($string); printf "Content = %vd\n", $string; } my $encoded_string; my @decoded_list; { use bytes; print "=========================\n"; print "Case 1: create string from pack, with use bytes\n"; $encoded_string = pack("U", 400, 306); display $encoded_string; @decoded_list = unpack("U", $encoded_string); print join(".", @decoded_list), "\n"; } { use utf8; #not necessary in this case print "=========================\n"; print "Case 2: create string from pack, with use utf8\n"; $encoded_string = pack("U", 400, 306); display $encoded_string; @decoded_list = unpack("U", $encoded_string); print join(".", @decoded_list), "\n"; } { print "=========================\n"; print "Case 3: create string from \\x{}\n"; $encoded_string = "\x{190}\x{132}";#hex value of 400 and 306 display $encoded_string; @decoded_list = unpack("U", $encoded_string); print join(".", @decoded_list), "\n"; } [download]	[reply] [d/l]
Re: Re: use bytes and length problem by Hofmator (Curate) on Mar 02, 2003 at 22:07 UTC
For those of you who are too lazy to run pg's code, here's the output ;-) ========================= Case 1: create string from pack, with use bytes char semantics: ЖДІ Length = 4, Content = 198.144.196.178 byte semantics: ЖДІ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 2: create string from pack, with use utf8 char semantics: ЖДІ Length = 4, Content = 198.144.196.178 byte semantics: ЖДІ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 3: create string from \x{} char semantics: ЖДІ Length = 2, Content = 400.306 byte semantics: ЖДІ Length = 4, Content = 198.144.196.178 400.306 [download] Update I'm on perl 5.6.0 on solaris, so it's probably my own problem ;-). Full spec: Read more... (3 kB) -- Hofmator	[reply] [d/l] [select]
Re: Re: Re: use bytes and length problem by pg (Canon) on Mar 02, 2003 at 22:52 UTC
Now this is getting interesting :-), when I ran my code, I got this: (I am using AS 5.8.0, and the testing code for case 4 is at the end of this post). ========================= Case 1: create string from pack, with use bytes char semantics: ЖђДІ Length = 2, Content = 400.306 byte semantics: ЖђДІ Length = 4, Content = 198.144.196.178 198.144.196.178 ========================= Case 2: create string from pack, with use buyes char semantics: ЖђДІ Length = 2, Content = 400.306 byte semantics: ЖђДІ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 3: create string from \x{} char semantics: ЖђДІ Length = 2, Content = 400.306 byte semantics: ЖђДІ Length = 4, Content = 198.144.196.178 400.306 ========================= Case 4: read string from unicode file char semantics: иЈґдЅіи°· Length = 4, Content = 35060.20339.35895.10 byte semantics: иЈґдЅіи°· Length = 10, Content = 232.163.180.228.189.179.232.176.183.10 Also, I want to add a case to cover the situation where you read your string from file: `{ print "=========================\n"; print "Case 4: read string from utf8 file\n"; open(FILE, "<:utf8", "test.txt"); $encoded_string = <FILE>; display $encoded_string; }` [download]	[reply] [d/l]
Re: Re: Re: Re: use bytes and length problem by BrowserUk (Patriarch) on Mar 02, 2003 at 23:47 UTC
Re: use bytes and length problem by Notromda (Pilgrim) on Mar 03, 2003 at 01:20 UTC
Re: use bytes and length problem by John M. Dlugosz (Monsignor) on Mar 03, 2003 at 07:27 UTC
That's one of the first things that got me when I first started exploring bytes vs. utf8 when it came out. The use bytes does not affect the way length works. Rather, the $txt value is already marked as to whether it is byte or char oriented. It really bugged me that there was no way to tell which way a string was oriented (prior to 5.8, or adding the Scalar::Utils module (IIRC the name), or more importantly in cases like this of setting the flag. I don't know off hand if Scalar::Utils can write the desired flag setting. If not, the way we've done it 'till now is with the "taint-like trick" of matching the whole string with a trivial pattern in parens. The resulting $1 will have the byte/char persuasion that the regex was compiled under (use utf8 or no utf8). I think the bytes pragma had nothing to do with it. That may have changed in 5.8. —John	[reply]
Re: Re: use bytes and length problem by Anonymous Monk on Mar 03, 2003 at 16:11 UTC
Thanks to all for your help. It appears as though my $size= utf8::upgrade($txt); has done the job for the problem. I'm actually doing: my $size= utf8::upgrade($txt); utf8::downgrade($txt); Although I seem to be ok without the downgrade, just in case it might cause me trouble later for the moment. This comes from the use utf8 perldoc. Does this make sense to all? Any closing thoughts? Again, Thanks to everyone for thier help. Jeff	[reply]
Re: Re: Re: use bytes and length problem by John M. Dlugosz (Monsignor) on Mar 03, 2003 at 17:11 UTC
The upgrade and downgrade functions are not in Perl 5.6's documentation, so it must be new to 5.8. Nice improvement! In case you didn't find it yet, the use utf-8 affects the compilation of regular expressions. —John P.S. you forgot to log in again. Try setting your theme to something other than the default. Then it will be obvious if you're not logged in.	[reply]