in reply to length() miscounting UTF8 characters?
#!/usr/bin/perl use strict; use warnings; use open IO => ':utf8'; while(<DATA>) { chomp; (my $nonenglish = $_) =~ s/[A-Za-z]//g; my @chars = split(//,$nonenglish); my $chars = scalar(@chars); print scalar(@chars), " $nonenglish\n"; } __DATA__ æ æð æða æðaber æðahnútur æðakölkun æðardúnn æðarfugl æðarkolla æðarkóngur æðarvarp æði æðimargur æðisgenginn æðiskast æðislegur æðrast æðri æðrulaus æðruleysi æðruorð æðrutónn æðstur æður æfa __END__
...$chars = $chars / 2; print "$chars $nonenglish\n";
Update: Might also take a look at CPAN Test UTF8 and related...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: length() miscounting UTF8 characters?
by AppleFritter (Vicar) on Apr 27, 2014 at 22:08 UTC |