Re: use bytes and length problem

That's one of the first things that got me when I first started exploring bytes vs. utf8 when it came out.

The use bytes does not affect the way length works.

Rather, the $txt value is already marked as to whether it is byte or char oriented.

It really bugged me that there was no way to tell which way a string was oriented (prior to 5.8, or adding the Scalar::Utils module (IIRC the name), or more importantly in cases like this of setting the flag.

I don't know off hand if Scalar::Utils can write the desired flag setting. If not, the way we've done it 'till now is with the "taint-like trick" of matching the whole string with a trivial pattern in parens. The resulting $1 will have the byte/char persuasion that the regex was compiled under (use utf8 or no utf8). I think the bytes pragma had nothing to do with it. That may have changed in 5.8.

—John

Comment on Re: use bytes and length problem

Replies are listed 'Best First'.
Re: Re: use bytes and length problem by Anonymous Monk on Mar 03, 2003 at 16:11 UTC
Thanks to all for your help. It appears as though my $size= utf8::upgrade($txt); has done the job for the problem. I'm actually doing: my $size= utf8::upgrade($txt); utf8::downgrade($txt); Although I seem to be ok without the downgrade, just in case it might cause me trouble later for the moment. This comes from the use utf8 perldoc. Does this make sense to all? Any closing thoughts? Again, Thanks to everyone for thier help. Jeff	[reply]
Re: Re: Re: use bytes and length problem by John M. Dlugosz (Monsignor) on Mar 03, 2003 at 17:11 UTC
The upgrade and downgrade functions are not in Perl 5.6's documentation, so it must be new to 5.8. Nice improvement! In case you didn't find it yet, the use utf-8 affects the compilation of regular expressions. —John P.S. you forgot to log in again. Try setting your theme to something other than the default. Then it will be obvious if you're not logged in.	[reply]