rastakouair has asked for the wisdom of the Perl Monks concerning the following question:

Hey,

I use glob function to list files in a directory. I also got a list of file in an array contening some file name in utf8 encoding.

I want to find files who are in both list. But when there are some accents characters (é,à,ù...) the equality doesn't work even if they are similar. I think it's because of both scalars encoding difference.

That's why I would like to find the encoding of the scalar returning by the glob function which i think depend on the system where perl is running on. In order to encode that scalar to utf8 and compare them.

Thanks in advance for your help, and excuse me for the poor english...

Replies are listed 'Best First'.
Re: find scalar encoding
by moritz (Cardinal) on Sep 09, 2008 at 12:03 UTC
    This is bad news for you: You can't reliably determine the encoding of a scalar.

    On Windows you could try to query the current codepage somehow, and hope that the filename is in that codepage (I don't think it's guaranteed though), on Unix systems you could try the current locale - but then again you can't blindly assume that it reflects the encoding of the file name.

    If you have a narrow selection of possible encodings, you can use Encode::Guess to determine which one is most likely correct, but it's only a heuristic, and usually doesn't work well on such short strings as file names.

      On Windows you can try your luck with Win32::Locale and more likely Win32::Codepage to obtain the information your looking for.

      However, if you want to write * portable * Perl code then it is probably better not to use file globbing altogether. See perlport.html section "System Interaction".

        perl on windows doesn't support unicode filenames directly
      Thanks everybody for your quick answers!
      Finally I used Encode::Guess and it works for me...I don't really know if it will on other computers...
      I also changed glob function for opendir