in reply to Re^2: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number
in thread utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number

So, if I understand correctly, when I represent the path string, piped from the find process to my program, with a byte steam, it should test correctly for existence by using -e.

And it does if you stop trying transforming the input from UTF-8 (which it isn't) to Unicode Code Points.

I've been trying to implement a routine that will recover from such corruption

Much easier to remove the erroneous conversion attempt that's corrupting it.

  • Comment on Re^3: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number

Replies are listed 'Best First'.
Re^4: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number
by igoryonya (Pilgrim) on Nov 22, 2014 at 00:19 UTC
    The files were recovered from an ntfs file system to ext4 in ubuntu.
    I thought that ext4 and ubuntu use utf8 by default, but I will try to set binmode STDIN to raw encoding, to see, if it helps.

      Unix file names are any sequence of bytes that don't contain 00 or 2F ("/" in ASCII). In this case, it's part of a UTF-8 string that's not valid UTF-8 itself.

      Your GUI and terminal operates using UTF-8, but that doesn't mean you can't create a file name that's not UTF-8.