# either this way: $s =~ s/^\X{0,$MAX_CHARS}\K.*//s; # or "by hand", this way: substr($s, pos $s) = "" if $s =~ /^\X{0,$MAX_CHARS}/g; #### 3768 case CLUMP: /* Match \X: logical Unicode character. This is defined as 3769 a Unicode extended Grapheme Cluster */ 3770 /* From http://www.unicode.org/reports/tr29 (5.2 version). An 3771 extended Grapheme Cluster is: 3772 3773 CR LF 3774 | Prepend* Begin Extend* 3775 | . 3776 3777 Begin is (Hangul-syllable | ! Control) 3778 Extend is (Grapheme_Extend | Spacing_Mark) 3779 Control is [ GCB_Control CR LF ] 3780 3781 The discussion below shows how the code for CLUMP is derived 3782 from this regex. Note that most of these concepts are from 3783 property values of the Grapheme Cluster Boundary (GCB) property.