Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Illegal characters in windows filenames?

by HamNRye (Monk)
on Feb 10, 2004 at 19:26 UTC ( [id://327995]=perlquestion: print w/replies, xml ) Need Help??

HamNRye has asked for the wisdom of the Perl Monks concerning the following question:

C:/Documents and Settings/jmaggard/Desktop/New Folder/ Pg. 4_ 2 x 13? & 3 x 14? is not a directory or file.

We are dragging files created on a mac and giving them an extension, and fixing the file names to be compatible with our workflows and CD burning.

We recieved a CD with some files on them that were named as shown above. The "?" does not show up in Windows as a part of the file name. When you view the file in DOS, the "?" are there. In Hex, the code is 3F, I'm assuming that's an inch mark '"' on the Mac.

The problem is that these files fail a if (-f $_) filetest. How can I reference these files if it fails by name??

Here is the code that processes these files (up to failure):

sub PopulateTree { my $path = shift; my $hashref = shift; wlog("Looking in $path..."); $pathcount++; #Added to get rid of empty dirs my $fixedpath = TrimPath($path); unless (-d $fixedpath) { mkdir $fixedpath, 0777; push @badpaths, $path; wlog( "Bad path found. Marked for deletion." ); } my $dir = new IO::Dir; my @dirContents; unless ($dir->open($path)) { warn("Couldn't open directory $path: $!\n"); return undef; }; unless (@dirContents = $dir->read()) { warn("Couldn't read directory $path: $!\n"); return undef; }; # now look at each item and decide what to do with it if (($#dirContents == 1) && ($path ne $inDir)) { push @badpaths, $path; push @badpaths, $fixedpath; wlog( "Empty path found. Marked for deletion." ); } ITEM: foreach my $item (@dirContents) { if (($item eq '.') || ($item eq '..')) { next ITEM; }; if (-d $path.$item) { # it's a directory, create an item entry for it $hashref->{$item} = { Type => 'Dir', Contents => {} }; # get its contents unless (PopulateTree($path.$item.'/', $hashref->{$item}->{Conten +ts})) { warn("PopulateTree failed"); return undef; }; } elsif (-f _) { # create an item entry for it $hashref->{$item} = { Type => 'File',}; } else { wlog("$path$item is not a directory or file."); }; }; return 1; };

Output:

Looking in C:/Documents and Settings/jmaggard/Desktop/New Folder/... C:/Documents and Settings/jmaggard/Desktop/New Folder/ Pg. 1_ 3 Col. x + 18? is not a directory or file. C:/Documents and Settings/jmaggard/Desktop/New Folder/ Pg. 2_ 3 Col. x + 14? is not a directory or file. C:/Documents and Settings/jmaggard/Desktop/New Folder/ Pg. 3_ 2 Col. x + 13? & 18? is not a directory or file. C:/Documents and Settings/jmaggard/Desktop/New Folder/ Pg. 4_ 2 x 13? +& 3 x 14? is not a directory or file.

The filename in hex:

20 50 67 2E 20 34 5F 20 32 20 78 20 31 33 3F 20 26 20 33 20 78 20 31 3 +4 3F

A screenshot of the file in explorer: http://nothing4sale.org/graphics/funk_folder_snap.jpg

Thanks for your help!
Das Ham

Edit by tye, make link of URL

Replies are listed 'Best First'.
Re: Illegal charachters in windows filenames??
by ysth (Canon) on Feb 10, 2004 at 19:55 UTC
    I don't know a lot about it, but it's my understanding that windows has a lot of functions where there is one version that takes paths with 1-byte characters and another version that takes paths with 2-byte characters. Using a program that calls the latter, you can make a file whose name isn't usable by a program the uses the former. Apparently the wide characters get replaced by a "?" when returned.

    What you need to do is translate the name you get into a short name, or read through the directory getting short names instead of long ones. I'm sure there must be a Win32:: function to do this, or you could experiment with parsing the output of system 'dir /X "long name with ? wildcard"'

Re: Illegal characters in windows filenames?
by jonadab (Parson) on Feb 10, 2004 at 20:36 UTC

    First, you find out who put non-word characters in the filenames. Then, you hit this evil person repeatedly with a large blunt object until he agrees to change them all. HTH.HAND.


    $;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/
Re: Illegal characters in windows filenames?
by jagb (Initiate) on Feb 10, 2004 at 20:10 UTC
    Have you tried encapsulating the filename in quotes for your filetest? The O/S you have uses question marks as single character wildcards. You don't get a filename match because it maches ONE character and doesn't try with "nothing" in that spot(as opposed to a NULL). I believe enclosing the name in quotes might be the trick in the short term.

      One of the first things I did was try enclosing the filename in quotes. It's interesting you mention the "?" being used for globbing. When I enclose the file name in quotes in the shell, tools like xcopy work fine, but Perl cannot see the files. So does the perl filetest glob?? I'm thinking not.

      I'm wondering if the DOS shell passes that along expecting globbing to take care of the unsupported charachter. If that's the case, I can try replacing the "?" in the file names with literal 0x3F's. Hmm, I'll be giving it a try.

Re: Illegal characters in windows filenames?
by graff (Chancellor) on Feb 11, 2004 at 02:39 UTC
    If you have (or can get and use) a windows port of the unix "find", it might be interesting to see what happens when you run that tool on the Mac directories (save its output to a file), and then compare that to the output you'd get by just doing the equivalent thing in perl:
    ... my @dircontents = $dir->read(); print join "\n", @dircontents, ''; ...
    I'm wondering what version of Perl you're using, and whether it might be having a problem with file names containing characters with byte values greater than 127. The "0x3F" character is a just question mark, no matter where you are, and I could imagine some ill-conceived "DWIM-ery" going on that might be based on not expecting characters above 127 in file names, or putting a "?" in place of such characters because there's nothing specified in your code about how to interpret them as characters (e.g. what to map them to in utf8). I'm not well acquainted with these issues on a Windows box, but I have seen, eg, MacArabic characters used in file names (from an Arabic newspaper cd-rom). Scary stuff. If your Perl release has a "perlunicode" man page, you might want to read that...
Re: Illegal characters in windows filenames?
by Jenda (Abbot) on Feb 10, 2004 at 21:10 UTC

    It would be best to fix the filenames on the Mac. At least to make sure they do not contain any of the folowing characters: / \ : * ? " < > |

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

      Not possible.... Well, possibleish, we are already having our designers fgixing them manually....

      Quick explanation, I work for a newspaper. As ads and artwork are sent to us for publication, we get alot on CD or ZIP. Well, a years worth of CD's with 4 MB each on them for 2-300 ads a week, is alot of physical storage room, and makes things hard to find.

      Our designers are supposed to drop the files they need on this Icon, which makes copies out to our Linux NAS server, which arranges them into volumes for burning. The designers are not supposed to have to "think" about the process.... The script can tell a .qxd from a .indd, etc... And of course, we can't just go tell our clients to fix their CD's or we won't run their ads.

      Right now, the script is popping up a TK alert and telling the designers to go fix it manually, but we crave elegance. I guess you could say the designers are the violent psychopaths that know where I live.

      On a completely unrelated note.... Thanks for the module repository, and other website goodies of yours I have made use of.

        What do you mean by "this Icon"? The designers use macs, right? So they drop the files on the icon of some program/script that then runs on their Mac and copies the file somewhere, right? Or is this Icon just a networked disk/share and it's the MacOS that copies the files to somewhere where they are found and processed by some background process running on the Linux?

        On the completely unrelated note ... thanks :-)

        Jenda
        Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
           -- Rick Osborne

        Edit by castaway: Closed small tag in signature

Re: Illegal characters in windows filenames?
by Anonymous Monk on Feb 11, 2004 at 04:14 UTC
Re: Illegal characters in windows filenames?
by fraktalisman (Hermit) on Feb 11, 2004 at 15:52 UTC
    besides Win32::.*? functions there is also Win32API::File that looks quite confusing to me. It says to be using 'low level' functions so maybe also a way to get those short file names (like "PROGRA~1" for "Program Files") that should be useful for your task.

      Thanks for the pointers monks... I'll report back here with any findings.

      Win32::File is for getting file attributes.... Compressed, hidden, read-only, etc... http://www.xav.com/perl/site/lib/Win32/File.html

      The other thought is just to use system commands to handle the globbing. Basically, if both file and directory tests fail, we do a system("mv $path$item $path$fixed_item_name") where I've made the fixed item name just not include the "?". Then I can at least find and manipulate the file.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://327995]
Approved by kutsu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (2)
As of 2024-04-20 03:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found