Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Sometimes MAC users will create PDF files with strange characters in the filename, which can not be handled by either Unix Shell or Perl(by which I mean 'Read', 'Write', 'Copy', etc). The only things that can be done is move the file to another place. I also tried to use Perl8 with better Unicode support, but could not get it work. Anyone here have any ideas about this? -Thanks,
  • Comment on strange MAC characters can't be handled by Shell or Perl

Replies are listed 'Best First'.
Re: strange MAC characters can't be handled by Shell or Perl
by steves (Curate) on Feb 06, 2003 at 03:02 UTC

    I hit this in data a lot, but rarely in file names. File names should follow the same model as other data for the most part though. When you say "can't be handled" do you mean you absolutely can't access the files, or that you're trying to provide names of those files and the files are not being found?

    For example, try this code against one of those directories:
    my $dir = 'your directory path'; local *DIR; opendir(DIR, $dir) or die "Failed to open $dir: $!\n"; my @files = readdir(DIR); print join("\n", @files), "\n";
    If that works and you see the file names, then they are accessible. Your dilemma is how to put the 'strange' characters into the file names so you can access them. The way I usually do this is to find the hex value of the character and embed it in the name; e.g.:
    my $file_name = "pdf_file_s\xa2\xb3.pdf";
    It's usually not as hard to find the hex values as you think. You can either dump the directory listing above and look at it in dumped (hex) form to find the values; or you can find the character set of either the source machine or your UNIX machine and see what hex values the "glyphs" you're seeing have. This is what confuses most people: Most machines have character sets that use ASCII in the 7-bit range. In the 8-bit range things change. One one machine a hex a2 may display as an ô on another as a ©, etc. The value is the same everywhere -- it's just the host system's interpretation of that value that differs. There are some good character set tables available at this site

    The big problem comes when users from different machines dump data to one database. I once had to clean up data that had 8-bit characters from both Windows and MAC systems. Not fun since the meaning of some 8-bit values varied depending on the original source machine, which was lost once the data was written.

Re: strange MAC characters can't be handled by Shell or Perl
by adrianh (Chancellor) on Feb 06, 2003 at 09:58 UTC
    Sometimes MAC users will create PDF files with strange characters in the filename, which can not be handled by either Unix Shell or Perl(by which I mean 'Read', 'Write', 'Copy', etc).

    Can you define "not handle" here? I move stuff between Macs and Unix boxes a lot and don't have problems (with the exception of path separators "/" vs ":").

    What are you trying to do? What errors are you getting?

Re: strange MAC characters can't be handled by Shell or Perl
by mowgli (Friar) on Feb 06, 2003 at 09:27 UTC

    Perl 8? Woo-hoo! Where can I get that? ;)

    --
    mowgli

Re: strange MAC characters can't be handled by Shell or Perl
by Cody Pendant (Prior) on Feb 07, 2003 at 05:43 UTC
    Not actually a contribution, but I'd encourage people not to write "MAC", if you mean "Apple Macintosh Computer".

    It's not an acronym*, so "MAC" or "M.A.C." are incorrect and risk confusion with another technology which is correctly referred to as "MAC", Media Access Control.

    Just use "Mac".

    * Though people do tell me Macintosh really stands for "Most Applications Crash. If Not, The OS Hangs"...
    --

    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D