in reply to length() miscounting UTF8 characters?

How do you call the script? It seems you are feeding it with STDIN, which is not affected by use open IO. The following works for me (both in 5.16.2 and 5.10.1):
#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; binmode STDOUT, 'utf8'; binmode DATA, 'encoding(utf-8)'; while (<DATA>) { chomp; s/[A-Za-z]//g; say $_, ' ', length; } __DATA__ æ æð æða æðaber æðahnútur æðakölkun æðardúnn æðarfugl æðarkolla æðarkóngur æðarvarp æði æðimargur æðisgenginn æðiskast æðislegur æðrast æðri æðrulaus æðruleysi æðruorð æðrutónn æðstur æður æfa
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: length() miscounting UTF8 characters?
by AppleFritter (Vicar) on Apr 27, 2014 at 22:42 UTC

    Yes, I'm piping the textfile into the script, though that's more for convenience than anything else. It'd be easy enough to change.

    I read up on the open pragma again and noticed that it can be fed another subpragma, :std, to affect the STD* streams:

    The :std subpragma on its own has no effect, but if combined with the :utf8 or :encoding subpragmas, it converts the standard filehandles (STDIN, STDOUT, STDERR) to comply with encoding selected for input/output handles. For example, if both input and out are chosen to be :encoding(utf8) , a :std will mean that STDIN, STDOUT, and STDERR are also in :encoding(utf8) .

    So I tried changing that line to

    use open IO => ':std', ':utf8';

    but that didn't make a difference either. I'm probably still missing something fairly obvious.

    Thanks for your help, by the way!

      You are almost there.
      use open IO => ':utf8', ':std';

      The order matters.

      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Wonderful! That really works - hardly Perl at its dwimmiest, but I'll take what I can get. Thanks so much again!