For grapheme support ("extended grapheme cluster" to be exact) built into Perl (\X), I would suggest you use a regex, for example my ($right) = $s=~/\A\X{10}(.*)\z/s; works.

However, your substr example also works for me, although I did take your example data, store it in a file, making sure to use UTF-8, and then I opened it via open my $fh, '<:encoding(UTF-8)', 'input.txt' or die $!;. Your ligatures - at least the way you've posted them here - appear to be stored in one Unicode character each ("\x{FB00}" and "\x{FB03}").

This leads me to suspect your problem might be occurring earlier, i.e. that your file is stored with a different encoding than the one you expect, or you are not opening it with the proper encoding in Perl. You might want to read the following recent threads for some general advice on dealing with encodings as well as specific advice on how to find out what encoding was used to store the file: Converting UTF8 to ANSI, Parsing issue (null bytes?), Parsing a Latin-1 Charset Data File - basically: 1. Be certain what encoding the data is stored with (looking at a hex dump of the file if necessary), 2. Open it with the proper encoding, as I showed above, and 3. Inspect the data once you've gotten it into Perl to make sure it was read properly (e.g. using Data::Dump). Only then can you properly use the facilities Perl provides to deal with Unicode.

Update: Clarified wording a bit.


In reply to Re: Finding substrings of fixed width text with graphemes by haukex
in thread Finding substrings of fixed width text with graphemes by albert

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.