Re: Re: Re: Re: regex for utf-8

I am wrapping my head around the bit-masking and conditional bit-shifting I need to do to extract the actual value of the code. The czyborra site http://czyborra.com/utf/ is invaluable, but my head is thick. How do I march down from the high bit of the first byte, testing and then extracting the hex codes from the succeeding bits?

Comment on Re: Re: Re: Re: regex for utf-8

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: regex for utf-8 by Thelonius (Priest) on Feb 28, 2003 at 16:12 UTC
I am wrapping my head around the bit-masking and conditional bit-shifting I need to do to extract the actual value of the code. The czyborra site http://czyborra.com/utf/ is invaluable, but my head is thick. How do I march down from the high bit of the first byte, testing and then extracting the hex codes from the succeeding bits? That's what unpack "U*" does.	[reply]
Re: Re: Re: Re: Re: Re: regex for utf-8 by Anonymous Monk on Feb 28, 2003 at 22:58 UTC
I have RTFM'ed pack() and unpack(), but don't understand its use in this context. What is the "TEMPLATE" being used here? `unpack TEMPLATE,EXPR unpack does the reverse of pack: it takes a string and expands it out +into a list of values. (In scalar context, it returns merely the firs +t value produced.)` [download] This line in the manual I find obscure as well, though it seems it would help me if I understood it: `sub ordinal { unpack("c",$_[0]); } # same as ord()` [download] Could you explain unpack "U*"? What is the "U"? (something to do with Unicode? I listened to "Well you needn't" (angular piano music)last night in celebration of the post showing the table for converting with cp 1252	[reply] [d/l] [select]
Re: Re: Re: Re: Re: Re: Re: regex for utf-8 by Anonymous Monk on Feb 28, 2003 at 23:06 UTC
I see that "pack()" is a little more expressive than "unpack()" in the manual. Sorry for the question asked before reading all i could.	[reply]