in reply to Read text file - Encoding problem?

Hi,

just a hint to help yourself. Try to show the filenames in a hex representation. Than you can compare what you have in the file and what you get reading the directory. Make a simple example reducing the problem:

opendir my $dh, '.' or die "ERROR: Couldn't open: $!"; my @entries = readdir($dh); closedir $dh; foreach my $entry (@entries) { print "$entry\n"; print gethex($entry), "\n"; } sub gethex { my $v = shift; return join '', map { sprintf("%x-", ord) } split //, $v; }
And like this code, open your csv file and read the entries in there.

Another hint: I can't see a explicit decoding while using read_file from File::Slurp. What do you get there? Are you sure that the csv file is create using UTF-8?

McA

Replies are listed 'Best First'.
Re^2: Read text file - Encoding problem?
by better (Acolyte) on Mar 17, 2013 at 10:53 UTC

    Hi McA,

    Thanks for that script. It seems that it is not a problem of encoding. I checked both text files, which are used to be read into a filehandle. There is a difference regular ocurring: Each line of the "bad" text file which was parsed from the csv and which is not working has a -d- at its end, while the lines of the "good" text file which is working with my script have not:

    eg:  I C 7700 -> 49-20-43-20-37-37-30-30-    #good

       I C 7700 -> 49-20-43-20-37-37-30-30-d     #bad

    and what I get reading the directory:

      I C 7700.jpg -> 49-20-43-20-37-37-30-30-2e-4a-50-47

    What does that mean? What stands "d" for?

    better

      In finding out, how to remove this "d", which is invisibly attached at the end of each line, I included into your script:

      chop $entry

      chomp wouldn't do!

      The gethex function shows that "d" is removed without loosing the last letter (or number).

      Later I will continue working on the question, how to parse the "bad" text file without the invisible "d" into my main script and use these shortened strings there as regex

      better