You haven't given us enough information. Do you have files with non-ASCII files names (in /usr/local/bin/) ? If so, are you sure about what character encoding is being used for those file names?

I'm guessing you do have non-ASCII file names, they are utf8-encoded, and you probably don't have these lines near the top of the script:

binmode STDOUT, ":utf8"; binmode STDERR, ":utf8";
and/or maybe you don't have this:
use Encode;
which would let you do something like this:
opendir( my $dir, "/usr/local/bin" ) or die "Can't read /usr/local/bin +: $!\n"; while ( my $fname = decode( "utf8", readdir( $dir ))) { print $fname, "\n"; }
That snippet, when used with the other lines above, will show you the file names found in your /usr/local/bin/. If you'd rather use the output of the "find" utility, it might go like this:
#!/usr/bin/perl use strict; use warnings; use Encode; binmode STDOUT, ":utf8"; binmode STDERR, ":utf8"; open(my $find,"|-:utf8","find /usr/local/bin -type f") or die "Can't r +un find: $!\n"; while ( <$find> ) { print; }
Note that the first example (using opendir/readdir) prints just the names of files in that one directory, and the second example (with "find") prints the absolute path names for all files in that directory and in all its subdirectories. (Update: and notice that "\n" has to be added in the first, but is already included in the file name string in the second.)

(Also, if all your file names are plain ASCII, the above scripts still work, because ASCII is a subset of utf8.)

Now, if some of your file names have non-ASCII characters, and use some character encoding other than utf8 (e.g. koi8-r or iso-8859-5 or cp1251 or whatever), you have to figure what that encoding is, and use it in place of "utf8" when you call decode() or open( ..., "|-...", "find ...").

If some of your file names have been corrupted (e.g. they were utf8-encoded but somehow got "renamed" with a bad byte sequence), you'll need to fix that.

(Update: I believe it is possible that a single directory can contain some file names that use one encoding, and other file names that use a different encoding. You might want to look closely at the man page for Encode, especially the part about catching errors ("FB_CROAK"), and you may also want to look at Encode::Guess.)


In reply to Re: utf8 "\xB7" does not map to Unicode at /usr/local/bin/бибс/об&#137; line 112. by graff
in thread utf8 "\xB7" does not map to Unicode at /usr/local/bin/бибс/об&#137; line 112. by nikolay

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.