websterling has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to learn a bit of Perl 6/Pugs by writing some code to solve the types of problems I routinely use Perl 5 for. Every now and then I run into a situation where it seems to me that Perl 6 tries to make a simple thing impossible.

This is one of those situations. What is the simplest way to get the length of a string? Google doesn't seem to know a simple way and I haven't stumbled across one while playing around in Pugs.

I've thought of some rather creative ways to do it, but it's hard to believe that length isn't included in the language.

Any insight is appreciated.

Replies are listed 'Best First'.
Re: Length of a string in Perl6
by lidden (Curate) on Aug 21, 2008 at 16:55 UTC
    Length is too ambiguous and has been dropped from the language. use $string.bytes instead, there is also codepoints, graphemes and elemes methods. elems is for array elements.
      There seems to be also $string.chars which sounds like the better substitute in the majority of use cases. Most of the time you want to know how long the string will be on your text terminal. At the moment it is an alias of grapheme, but who wants to write grapheme all the time?
        "chars" is only slightly shorter than "graphs" (one byte (utf-8), codepoint and grapheme ;-).

        I think in general it pays off to think about which kind of length you want. If you want to insert stuff into a database, and have to care about size limits, you'll probably care more about bytes or codepoints (depends on how well your DBMS handles Unicode).

        But you're right, most of the time the programmer is interested in graphemes when dealing with text processing.

Re: Length of a string in Perl6
by moritz (Cardinal) on Aug 21, 2008 at 17:22 UTC
    Others have given the correct answers, here's why length is banned:

    Unicode gives some entities names, and call these things codepoints. They are things like 'a', '1' etc, but also non-ASCII stuff like 'ä' or 'ش' (if you have no arabic fonts installed, you won't see anything).

    In most encodings you need several bytes to display such characters. Thus the number of bytes and codepoints can be different.

    But that's not enough: There are sequences of codepoints that, when displayed, look like a single character.

    perl -Mcharnames=:full -CS -wle 'print "A\N{COMBINING ACUTE ACCENT}"' Á

    So it's two codepoints, but only one "grapheme".

    So you can get the number of characters, depending on what you mean by "characters":

    $x.bytes $x.codes $x.graphs

    There's also a current "Unicode Level", which defaults to graphemes. So you can say $x.chars, and that will internally call the method that corresponds to the current unicode level.