in reply to strange MAC characters can't be handled by Shell or Perl
I hit this in data a lot, but rarely in file names. File names should follow the same model as other data for the most part though. When you say "can't be handled" do you mean you absolutely can't access the files, or that you're trying to provide names of those files and the files are not being found?
For example, try this code against one of those directories:If that works and you see the file names, then they are accessible. Your dilemma is how to put the 'strange' characters into the file names so you can access them. The way I usually do this is to find the hex value of the character and embed it in the name; e.g.:my $dir = 'your directory path'; local *DIR; opendir(DIR, $dir) or die "Failed to open $dir: $!\n"; my @files = readdir(DIR); print join("\n", @files), "\n";
It's usually not as hard to find the hex values as you think. You can either dump the directory listing above and look at it in dumped (hex) form to find the values; or you can find the character set of either the source machine or your UNIX machine and see what hex values the "glyphs" you're seeing have. This is what confuses most people: Most machines have character sets that use ASCII in the 7-bit range. In the 8-bit range things change. One one machine a hex a2 may display as an ô on another as a ©, etc. The value is the same everywhere -- it's just the host system's interpretation of that value that differs. There are some good character set tables available at this sitemy $file_name = "pdf_file_s\xa2\xb3.pdf";
The big problem comes when users from different machines dump data to one database. I once had to clean up data that had 8-bit characters from both Windows and MAC systems. Not fun since the meaning of some 8-bit values varied depending on the original source machine, which was lost once the data was written.
|
|---|