http://qs1969.pair.com?node_id=67760

mkmcconn has asked for the wisdom of the Perl Monks concerning the following question:

My work permits me to use Linux and FreeBSD most of the day, which I'm happy for just because it's fun. But, there are many things more easily done in Windows - like, working in Windows. So, this is not a Windows-bashing post, even though it may provide some amusement to Microsoft haters.

I am embarrassed by what I'm about to show you, and proud of it at the same time. It's a workaround (in the bad sense) for one of the silliest day-to-day problems I face: Reassociate the "DOS" filename with the Long File Name, in Windows. It's a silly problem, but I want to tell you how I approached it and invite your better solutions.

Some backround:
A government ( read: "would-try-to-solve-this-problem-if-not-for-under-budgeting" ) vendor, provides files for us, zipped into self-extracting archives. The images are indexed similarly to this:

#Year Doc_Num Image_File 2001 20233 E:\TEMP\IMAGES\2001_020233.tif 2001 20234 E:\TEMP\IMAGES\2001_020234.tif # etc...

All the names of the files in this index correspond to the files in the zipped archive. However, the vendor's software, at some point between production and delivery, does not support Long File Names. Thus, the files in the archive display only an 8.3 name - the LFN has been disassociated from the file. The vendor has also shared this program with other agencies. Consequently, we see this problem with increasing frequency.

The task is, reassociate the 8.3 filename with the LFN listed in the index, so that the index can be used and we can avoid renaming the files by hand. Here is part of the code that does that:

use strict; use Win32; use File::Basename qw(fileparse basename); use CGI qw(pretty); $|++; my $out = new CGI; my ( $directory, $tempdir, $index_file); # ... snip ... sub make{ mkdir $tempdir; open FILE, "< $index_file" or die $out->p("$index_file: $!"); open OUT, "> $index_file.result" or warn $out->p("$!\n"); print $out->start_p(), $out->br("\t", $out->a({-href=> "file://$index_file.result"}, "$index +_file.result"), "OPENED\n"); my $incr = 0 ; while (my $long = <FILE>){ $long =~ s/^.*(\b\w+_\w+\.\w+)\s*/$1/ or next; $long = "$tempdir/$long"; open NEWFILE, "> $long"; close NEWFILE; my $short = Win32::GetShortPathName($long); $long =~ s/^.*(\b\w+)_(\w+\.\w+)/$1$2/; $short =~ s/^.*(\b\w+~\w+\.\w+)/$1/; print OUT $incr++,", $short, $long\n"; } print $out->end_p(); close OUT; close FILE; }

Yes, if you are still reading, what this sub does is, create a directory full of zero-byte files named according to the index file. The program then reads the filenames to get the Short Path supplied by the system. Another sub destroys these temporary files and the produced log ( OUT, when the work is completed.

Why such a roundabout route? Well, as all you 12th level mages know, and as I've only recently found out: Microsoft has no less than three completely different LFN => 8.3 conversion algorithms for their three most common operating systems.
Under Windows 98:

2001_020674.tif => 2001_0~1.TIF ... 2001_020677.tif => 2001_0~4.TIF 2001_020678.tif => 2001_0~5.TIF


Under Windows NT:
2001_020674.tif => 2001_0~1.TIF ... 2001_020677.tif => 2001_0~4.TIF 2001_020678.tif => 204EFD~1.TIF


Under Windows2000:
2001_020674.tif => 2001_0~1.TIF ... 2001_020677.tif => 2001_0~4.TIF 2001_020678.tif => 208483~1.TIF

So, you see, the 8.3 name is created using a different algorithm, depending on the system.
Windows 98 may have names like 20~36009.TIF, but NT and 2000 create names with only one character to the right of the tilde.

All the systems will attempt to name the first file 2001_0~1.TIF. Spotting a conflict with that name, they will all manufacture a non-conflicting name in the same pattern, up to the fourth conflicting name. Then, the disimilarity in the algorithm appears, and from then on the name conflicts are resolved with completely different results. I would not be at all surprised if there are as many results as there are different Windows, but I don't know this.

So, a question/challenge to close, since I chose to post this in Seekers of Perl Wisdom:
If you have some simple way to resolve this problem, then I would be most happy to read it.

Especially, if you know what the algorithm is that handles this for each version of Windows, perhaps you can point it out to me and I can try to translate it into Perl(it may be in windows.h - I am a poor reader of C and C++ , but I don't have that header file to try).

I hope you find this at least amusing, or even educational, as I have.
mkmcconn

Replies are listed 'Best First'.
(jcwren) Re: Windows LFN to 8.3 trivia
by jcwren (Prior) on Mar 28, 2001 at 19:50 UTC

    I see that you are using Win32::GetShortPathName. There is a corresponding Win32::GetLongPathName, where you pass it a short name path and returns into a buffer a long name path. It states that it's present for all platforms, 95/98/NT/W2K.

    Unless I misunderstand the goal (which could be entirely the case), could you not loop through the unpacked files in a directory, convert the name using GetLongPathName, then rename them to the resulting LFN?

    I haven't tried this, but was basing this on my browsing of the MSDN.

    --Chris

    e-mail jcwren

      That would be a "chicken and egg" problem. You could do what you suggest if you already had the files named with long names. That is, the files are extracted with names like "2001_0~4.TIF" so GetLongPathName() will return "2001_0~4.TIF" (that is the only name that the file system has been given for that file).

      I worry that even the original solution is not enough as the encoding depends on the order in which the files were created (that is, it is not uniquely determined from the set of file names)! Unless the files are listed in "creation order", you can't recover the long names from just the short names!

              - tye (but my friends call me "Tye")

        You are correct about the way that GetLongPathName() will work.

        My first attempt was to sort files by the time stamp, and rename them in that order from the index. But, apparently the files were created in various places and then moved into the same directory at some point. The index order is the creation order. This happy coincidence is the reason I've called this 'a workaround in the bad sense'.

        By the way, I think that the "budget constraints" excuse for not fixing this means, "I don't have time or staff to look into your problem", not "we can't afford different software".
        mkmcconn

Re: Windows LFN to 8.3 trivia
by boo_radley (Parson) on Mar 28, 2001 at 11:04 UTC
    It doesn't address your question directly, but I think it's a decent effort : when the real data's ready for zipping, make them run LFNBAK and include the resulting LFNBAK.DAT file in the archive.
    I wish I could find the appropriate options for it, but I don't have access to the install CD now.
    Oh, or have your vendor get software made in the last 6 years. But you did mention budgetary problems.
Re: Windows LFN to 8.3 trivia
by hannibal (Scribe) on Mar 28, 2001 at 18:25 UTC
    Yeah, these guys must be using something out of date to zip this stuff up with. Like someone else said, point them at Infozip, or Winzip if they like (which is based on infozip anyway). It's been a while since any zip program didn't support LFN's I think.. or make sure they aren't checking the "store filenames in 8.3 format" box in Winzip, if that's what they're using...