in reply to Handling windoze filenames with odd charactters

I would be very surprised if stat or rename has problems with files that contain odd characters. I think it is much more likely that you have a bug in your script, so the filename you are passing to stat is not what you think.

Have you checked that the problem files actually exist. It is possible that the person who prepared the spreadsheet did so by typing in the filenames by hand, and made mistakes. It is also possible that MS Excel's auto correct feature changed the characters, for example by changing a plain hyphen (ASCII 0x45: -) into an em hyphen (Unicode U+2014: —)

Also when you write the substitution: $pdfn =~ s/^\"|\"$//g; I presume that you are looking to remove quotes from the start or the end of the string. I think you need to enclose the ^\" and \"$ clauses in round brackets in order to use the alternation operator, as otherwise it might ignore the anchors on the start and end of the string. In other words the regexp engine will treat that substitution as: /^((\")|(\"))$/ and remove quotes from any part of the $pdfn string. I would write the substitution as: $pdfn =~ s/^\"?(.*)\"?$/$1/g;

Replies are listed 'Best First'.
Re^2: Handling windoze filenames with odd charactters
by Anonymous Monk on Feb 21, 2011 at 02:33 UTC

      OK. Colour me supprised.

      When I wrote that I would be surprised if stat or rename had problems with files that contain odd characters, I was actualy thinking of characters is the ASCII character set, not Unicode, howerver, I am suprised and disapointed that perl cannot transparently handle unicode in filenames.

      Perl has for many years transparently handled unicode in string varables. There are of course many pitfalls in constructing those strings from data external to the script, but in this case that should not be the programer's problem. Perl's readdir should just make the appropare Windows System calls to get the unicode filename, and store that filename, complete with any unicode in an internal string.

      The programer should then be able to read and write to files with those names without worring if they contain unicode or not. Obvously if the programer is transforming filenames they they need to be carefull, but in many cases that is not an issue. It is far more common to open and read a file than it is to rename one.

      I think that it is a mistake in 2011 for perl to deleberately use the old Win9x API to get an ASCII filename for backwards compatibility reasons, when the last Win9x OS was retired many years ago.