So given this string: Triple “S” Industrial Corp (note funky quotes)

More precisely, you have this text encoded using UTF-8.

What are the characters \342\200\234 (the left funky quote)

Octal escape sequences that produce the bytes that form the encoding of «» using UTF-8.

use feature qw( say ); use Encode qw( encode ); say encode("UTF-8", "\N{LEFT DOUBLE QUOTATION MARK}") eq "\342\200\234"; # Output: 1

How would I manually decode them if I wanted to ?

You could use
utf8::decode($s);

If this string was constructed from a string literal, then you should have used the following to tell Perl the source was encoded using UTF-8 instead of ASCII:

use utf8;

If this is read from a file, an encoding layer would do this automatically for you. You can set this up using

use open ':std', ':encoding(UTF-8)';

Is this is why CUR reports 30 "perl characters" instead of 26 actual characters?

The string has 30 characters, not 26. You can verify this using length. If you were to decode those 30 bytes, you would get 26 Unicode Code Points, but that would be a different string, and length would return 26.

use feature qw( say ); use Encode qw( decode ); no utf8; my $utf8 = "Triple “S” Industrial Corp"; say length($utf8); # 30 chars my $ucp = decode("UTF-8", $utf8); say length($ucp); # 26 chars

That said, CUR indicates the number of bytes of the string buffer that are being used, not the number of characters in the string. They just happen to be the same for your string.

use feature qw( say ); use Encode qw( decode ); use Devel::Peek qw( Dump ); no utf8; my $utf8 = "Triple “S” Industrial Corp"; say length($utf8); # 30 chars Dump($utf8); # CUR = 30 my $ucp = decode("UTF-8", $utf8); say length($ucp); # 26 chars Dump($ucp); # CUR = 30

Because we called length before Dump, you'll see the PERL_MAGIC_utf8 (w) magic was added to cache the length (MG_LEN = 26).


In reply to Re: How to interpret characters in Devel::Peek CUR by ikegami
in thread How to interpret characters in Devel::Peek CUR by ait

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.