Perl generally handles UTF-8 and Unicode very well, but there's a limit: file names. Linux doesn't have an encoding-aware API for file name operations, so it's not really perl's fault.
That being said, the normal approach is to decode the data from the outside world into text strings, work with it, and encode it back to byte strings before you print it or perform operations on the file system.
However, if file names and input data have the same encoding, everything (except some string operations like substr and regex matches) should work just fine. Which suggests that some of your data or file names have a different encoding than the system default of UTF-8.
There's a lot to say about it, and I already said much here. There's also perluniintro, the excellent Encode module (it's a core module), and perlunicode. | [reply] |
| [reply] [d/l] [select] |
Here are the answers.
Ad 1:
SV = PV(0x8154064) at 0x8153594
REFCNT = 1
FLAGS = (PADBUSY,PADTMP,POK,READONLY,pPOK)
PV = 0x81757e8 "mp3_v4.csv"\0
CUR = 10
LEN = 12
Ad 2:
SV = PV(0x8153ae8) at 0x81535b8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x817dad8 "D\303\251j\303\240 Vu"\0
CUR = 9
LEN = 12
SV = PV(0x8153ae8) at 0x81535b8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x817db88 ".."\0
CUR = 2
LEN = 4
SV = PV(0x8153ae8) at 0x81535b8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK)
PV = 0x817dad8 "."\0
CUR = 1
LEN = 4
| [reply] |
Ad 1:
...
PV = 0x81757e8 "mp3_v4.csv"\0
I think the file name in the CSV file - the one with the "Déjà" - would be more interesting :) —
which is what ikegami meant...
Ad 2:
...
PV = 0x817dad8 "D\303\251j\303\240 Vu"\0
This verifies that the filesystem encoding of that file is UTF-8.
So, presumably, the file name in the CSV file isn't in UTF-8.
| [reply] [d/l] [select] |
See utf8 filenames I ran into a similar problem awhile ago, and graff gave me this sub to fix the problem
#this decode utf8 routine is used so filenames with extended
# ascii characters (unicode) in filenames, will work properly
use Encode;
opendir my $dh, $path or warn "Error: $!";
my @files = grep !/^\.\.?$/, readdir $dh;
closedir $dh;
# @files = map{ "$path/".$_ } sort @files;
#$_ = decode( 'utf8', $_ ) for ( @files );
@files = map { decode( 'utf8', "$path/".$_ ) } sort @files;
| [reply] [d/l] |