in reply to use bytes and length problem

That's one of the first things that got me when I first started exploring bytes vs. utf8 when it came out.

The use bytes does not affect the way length works.

Rather, the $txt value is already marked as to whether it is byte or char oriented.

It really bugged me that there was no way to tell which way a string was oriented (prior to 5.8, or adding the Scalar::Utils module (IIRC the name), or more importantly in cases like this of setting the flag.

I don't know off hand if Scalar::Utils can write the desired flag setting. If not, the way we've done it 'till now is with the "taint-like trick" of matching the whole string with a trivial pattern in parens. The resulting $1 will have the byte/char persuasion that the regex was compiled under (use utf8 or no utf8). I think the bytes pragma had nothing to do with it. That may have changed in 5.8.

—John

Replies are listed 'Best First'.
Re: Re: use bytes and length problem
by Anonymous Monk on Mar 03, 2003 at 16:11 UTC

    Thanks to all for your help. It appears as though

    my $size= utf8::upgrade($txt);

    has done the job for the problem. I'm actually doing:

    my $size= utf8::upgrade($txt);
    utf8::downgrade($txt);

    Although I seem to be ok without the downgrade, just in case it might cause me trouble later for the moment.

    This comes from the *use utf8* perldoc. Does this make sense to all? Any closing thoughts?

    Again, Thanks to everyone for thier help.

    Jeff

      The upgrade and downgrade functions are not in Perl 5.6's documentation, so it must be new to 5.8. Nice improvement!

      In case you didn't find it yet, the use utf-8 affects the compilation of regular expressions.

      —John

      P.S. you forgot to log in again. Try setting your theme to something other than the default. Then it will be obvious if you're not logged in.