in reply to Re: Re: Re: Re: regex for utf-8
in thread regex for utf-8

I am wrapping my head around the bit-masking and conditional bit-shifting I need to do to extract the actual value of the code. The czyborra site http://czyborra.com/utf/ is invaluable, but my head is thick. How do I march down from the high bit of the first byte, testing and then extracting the hex codes from the succeeding bits?
That's what unpack "U*" does.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Re: regex for utf-8
by Anonymous Monk on Feb 28, 2003 at 22:58 UTC
    I have RTFM'ed pack() and unpack(), but don't understand its use in this context. What is the "TEMPLATE" being used here?
    unpack TEMPLATE,EXPR unpack does the reverse of pack: it takes a string and expands it out +into a list of values. (In scalar context, it returns merely the firs +t value produced.)
    This line in the manual I find obscure as well, though it seems it would help me if I understood it:
    sub ordinal { unpack("c",$_[0]); } # same as ord()
    Could you explain unpack "U*"? What is the "U"? (something to do with Unicode? I listened to "Well you needn't" (angular piano music)last night in celebration of the post showing the table for converting with cp 1252
      I see that "pack()" is a little more expressive than "unpack()" in the manual. Sorry for the question asked before reading all i could.