Length of a string in Perl6

websterling has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to learn a bit of Perl 6/Pugs by writing some code to solve the types of problems I routinely use Perl 5 for. Every now and then I run into a situation where it seems to me that Perl 6 tries to make a simple thing impossible.

This is one of those situations. What is the simplest way to get the length of a string? Google doesn't seem to know a simple way and I haven't stumbled across one while playing around in Pugs.

I've thought of some rather creative ways to do it, but it's hard to believe that length isn't included in the language.

Any insight is appreciated.

Comment on Length of a string in Perl6

Replies are listed 'Best First'.
Re: Length of a string in Perl6 by lidden (Curate) on Aug 21, 2008 at 16:55 UTC
Length is too ambiguous and has been dropped from the language. use `$string.bytes` instead, there is also codepoints, graphemes and elemes methods. elems is for array elements.	[reply] [d/l]
Re^2: Length of a string in Perl6 by jethro (Monsignor) on Aug 21, 2008 at 17:09 UTC
There seems to be also $string.chars which sounds like the better substitute in the majority of use cases. Most of the time you want to know how long the string will be on your text terminal. At the moment it is an alias of grapheme, but who wants to write grapheme all the time?	[reply]
Re^3: Length of a string in Perl6 by moritz (Cardinal) on Aug 21, 2008 at 17:50 UTC
"chars" is only slightly shorter than "graphs" (one byte (utf-8), codepoint and grapheme ;-). I think in general it pays off to think about which kind of length you want. If you want to insert stuff into a database, and have to care about size limits, you'll probably care more about bytes or codepoints (depends on how well your DBMS handles Unicode). But you're right, most of the time the programmer is interested in graphemes when dealing with text processing.	[reply]
Re: Length of a string in Perl6 by moritz (Cardinal) on Aug 21, 2008 at 17:22 UTC
Others have given the correct answers, here's why `length` is banned: Unicode gives some entities names, and call these things codepoints. They are things like 'a', '1' etc, but also non-ASCII stuff like 'ä' or 'ش' (if you have no arabic fonts installed, you won't see anything). In most encodings you need several bytes to display such characters. Thus the number of bytes and codepoints can be different. But that's not enough: There are sequences of codepoints that, when displayed, look like a single character. `perl -Mcharnames=:full -CS -wle 'print "A\N{COMBINING ACUTE ACCENT}"' Á` [download] So it's two codepoints, but only one "grapheme". So you can get the number of characters, depending on what you mean by "characters": `$x.bytes $x.codes $x.graphs` [download] There's also a current "Unicode Level", which defaults to graphemes. So you can say `$x.chars`, and that will internally call the method that corresponds to the current unicode level.	[reply] [d/l] [select]