Please ignore - I believe the issue is in the file - apologies if anyone wasted time looking at this
I want to read a file and display it in a Tk-based editor for which I have inherited the support role. It successfully loads many files containing utf8 characters. However, when I attempt to read the file below, it fails to interpret the utf8 characters correctly. The file contains the ascii characters "ABC", a space, then two Cyrillic characters: U+0440 (Cyrillic Small Letter Er) and U+0435 (Cyrillic Small Letter Ie).
I have extracted what I believe to be the relevant bit of code, and added a printf to output the characters in hex. I would expect the following output:
41 42 43 20 440 435 a
Or possibly (looking at the utf-8 encoding for those two characters):
41 42 43 20 d1 80 d0 b5 a
I actually get the following which displays as "ABC ре" in the editor:
41 42 43 20 e0 a5 a
Any help would be gratefully received.
Code
#!/usr/bin/perl if ( open my $fh, '<', "321t.txt" ) { my $line = <$fh>; utf8::decode($line); for (0..length($line)-1) { printf("%x\n",ord(substr($line,$_,1))); } }
File
ABC ре| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |