in reply to Re: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number
in thread utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number

The folders/files were recovered by using a testdisk program in linux from the accidentally deleted ntfs partition.

When I test the passed corrupt file name to the perl program with the -e, it says that the file doesn't exist, although, if I use an internal perl's directory reading, it shows those files fine without any character problems and if I test files, listed by perl for existence, -e proves their existence.

So, if I understand correctly, when I represent the path string, piped from the find process to my program, with a byte steam, it should test correctly for existence by using -e.

I've been trying to implement a routine that will recover from such corruption and find the file correctly when passed from stdin. I want to keep the ability to pipe the names from the external source.

  • Comment on Re^2: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number

Replies are listed 'Best First'.
Re^3: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number
by ikegami (Patriarch) on Nov 21, 2014 at 17:23 UTC

    So, if I understand correctly, when I represent the path string, piped from the find process to my program, with a byte steam, it should test correctly for existence by using -e.

    And it does if you stop trying transforming the input from UTF-8 (which it isn't) to Unicode Code Points.

    I've been trying to implement a routine that will recover from such corruption

    Much easier to remove the erroneous conversion attempt that's corrupting it.

      The files were recovered from an ntfs file system to ext4 in ubuntu.
      I thought that ext4 and ubuntu use utf8 by default, but I will try to set binmode STDIN to raw encoding, to see, if it helps.

        Unix file names are any sequence of bytes that don't contain 00 or 2F ("/" in ASCII). In this case, it's part of a UTF-8 string that's not valid UTF-8 itself.

        Your GUI and terminal operates using UTF-8, but that doesn't mean you can't create a file name that's not UTF-8.