What do you know about the process(es) creating these file names? That's likely to be the source of the problem.

Assuming you are using a utf8-based terminal window, the question mark that you see in the terminal at the end of the file name is a symptom of a malformed character in a utf8 string (such as a start byte like \xD0 or \xD1 that is not followed by a valid continuation byte).

The file system doesn't really care about how (or whether) the byte sequence used for a file name is interpreted via this or that character encoding - there are some characters in the ASCII range that can't be used in a file name (e.g. null or slash on unix/linux), but apart from that, any byte sequence is as good as any other, whether or not it makes sense when using any given character encoding.

You should be able to rename the affected files - perl is especially handy for doing this: either you can infer the intended character(s), or you can simply replace bad bytes with something valid that yields a unique file name in the given directory. In order to rename the file, you have to treat the existing (bad) file name as a raw byte sequence, not as utf8 characters.

(You might consider going to ASCII-only characters for file names - e.g. using a suitable Cyrillic-to-Latin transliteration - to avoid the problems that tend to come up with multi-byte characters in file names.)


In reply to Re: utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number by graff
in thread utf8 "\xD0" does not map to Unicode at /path/comparebin.pl line line_number, <STDIN> line line_number by igoryonya

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.