Re: Read text file - Encoding problem?

Hi,

just a hint to help yourself. Try to show the filenames in a hex representation. Than you can compare what you have in the file and what you get reading the directory. Make a simple example reducing the problem:

opendir my $dh, '.' or die "ERROR: Couldn't open: $!";
my @entries = readdir($dh);
closedir $dh;
foreach my $entry (@entries) {
    print "$entry\n";
    print gethex($entry), "\n";
}

sub gethex {
    my $v = shift;
    return join '', map { sprintf("%x-", ord) } split //, $v;
}
[download]

And like this code, open your csv file and read the entries in there.

Another hint: I can't see a explicit decoding while using read_file from File::Slurp. What do you get there? Are you sure that the csv file is create using UTF-8?

McA

Comment on Re: Read text file - Encoding problem? Download Code

Replies are listed 'Best First'.
Re^2: Read text file - Encoding problem? by better (Acolyte) on Mar 17, 2013 at 10:53 UTC
Hi McA, Thanks for that script. It seems that it is not a problem of encoding. I checked both text files, which are used to be read into a filehandle. There is a difference regular ocurring: Each line of the "bad" text file which was parsed from the csv and which is not working has a -d- at its end, while the lines of the "good" text file which is working with my script have not: eg: I C 7700 -> 49-20-43-20-37-37-30-30- #good I C 7700 -> 49-20-43-20-37-37-30-30-d #bad and what I get reading the directory: I C 7700.jpg -> 49-20-43-20-37-37-30-30-2e-4a-50-47 What does that mean? What stands "d" for? better	[reply]
Re^3: Read text file - Encoding problem? by better (Acolyte) on Mar 17, 2013 at 12:18 UTC
In finding out, how to remove this "d", which is invisibly attached at the end of each line, I included into your script: `chop $entry` chomp wouldn't do! The gethex function shows that "d" is removed without loosing the last letter (or number). Later I will continue working on the question, how to parse the "bad" text file without the invisible "d" into my main script and use these shortened strings there as regex better	[reply] [d/l]
Re^4: Read text file - Encoding problem? by poj (Abbot) on Mar 17, 2013 at 12:50 UTC
The 'd' is 0d the hex code for a carriage return. This node has the details Why chomp() is not considering carriage-return poj	[reply]
Re^5: Read text file - Encoding problem? by better (Acolyte) on Mar 17, 2013 at 15:14 UTC