comment on

My work permits me to use Linux and FreeBSD most of the day, which I'm happy for just because it's fun. But, there are many things more easily done in Windows - like, working in Windows. So, this is not a Windows-bashing post, even though it may provide some amusement to Microsoft haters.

I am embarrassed by what I'm about to show you, and proud of it at the same time. It's a workaround (in the bad sense) for one of the silliest day-to-day problems I face: Reassociate the "DOS" filename with the Long File Name, in Windows. It's a silly problem, but I want to tell you how I approached it and invite your better solutions.

Some backround:
A government ( read: "would-try-to-solve-this-problem-if-not-for-under-budgeting" ) vendor, provides files for us, zipped into self-extracting archives. The images are indexed similarly to this:

#Year   Doc_Num   Image_File
2001   20233     E:\TEMP\IMAGES\2001_020233.tif
2001   20234     E:\TEMP\IMAGES\2001_020234.tif

# etc...
[download]

All the names of the files in this index correspond to the files in the zipped archive. However, the vendor's software, at some point between production and delivery, does not support Long File Names. Thus, the files in the archive display only an 8.3 name - the LFN has been disassociated from the file. The vendor has also shared this program with other agencies. Consequently, we see this problem with increasing frequency.

The task is, reassociate the 8.3 filename with the LFN listed in the index, so that the index can be used and we can avoid renaming the files by hand. Here is part of the code that does that:

use strict;
use Win32;
use File::Basename qw(fileparse basename);
use CGI qw(pretty);

$|++;

my $out = new CGI;
my ( $directory,
     $tempdir,
     $index_file);


# ... snip ...

sub make{
     mkdir $tempdir;
     
     open FILE, "< $index_file" or die $out->p("$index_file: $!");
     open OUT, "> $index_file.result" or warn $out->p("$!\n");
     
     print  $out->start_p(),
            $out->br("\t",
                $out->a({-href=> "file://$index_file.result"}, "$index
+_file.result"),
                "OPENED\n");
     my $incr = 0 ;
     
     while (my $long = <FILE>){
          
          $long =~ s/^.*(\b\w+_\w+\.\w+)\s*/$1/ or next;
          $long = "$tempdir/$long";

          open NEWFILE, "> $long";
          close NEWFILE;

          my $short = Win32::GetShortPathName($long);

          $long =~ s/^.*(\b\w+)_(\w+\.\w+)/$1$2/;
          $short =~ s/^.*(\b\w+~\w+\.\w+)/$1/;

          print OUT $incr++,", $short, $long\n";

     }
     print $out->end_p();
     close OUT;
     close FILE;
}
[download]

Yes, if you are still reading, what this sub does is, create a directory full of zero-byte files named according to the index file. The program then reads the filenames to get the Short Path supplied by the system. Another sub destroys these temporary files and the produced log ( OUT, when the work is completed.

Why such a roundabout route? Well, as all you 12th level mages know, and as I've only recently found out: Microsoft has no less than three completely different LFN => 8.3 conversion algorithms for their three most common operating systems.
Under Windows 98:

2001_020674.tif => 2001_0~1.TIF   
 ...
2001_020677.tif => 2001_0~4.TIF 
2001_020678.tif => 2001_0~5.TIF
[download]

Under Windows NT:

2001_020674.tif => 2001_0~1.TIF   
...
2001_020677.tif => 2001_0~4.TIF 
2001_020678.tif => 204EFD~1.TIF
[download]

Under Windows2000:

2001_020674.tif => 2001_0~1.TIF   
...
2001_020677.tif => 2001_0~4.TIF 
2001_020678.tif => 208483~1.TIF
[download]

So, you see, the 8.3 name is created using a different algorithm, depending on the system.
Windows 98 may have names like 20~36009.TIF, but NT and 2000 create names with only one character to the right of the tilde.

All the systems will attempt to name the first file 2001_0~1.TIF. Spotting a conflict with that name, they will all manufacture a non-conflicting name in the same pattern, up to the fourth conflicting name. Then, the disimilarity in the algorithm appears, and from then on the name conflicts are resolved with completely different results. I would not be at all surprised if there are as many results as there are different Windows, but I don't know this.

So, a question/challenge to close, since I chose to post this in Seekers of Perl Wisdom:
If you have some simple way to resolve this problem, then I would be most happy to read it.

Especially, if you know what the algorithm is that handles this for each version of Windows, perhaps you can point it out to me and I can try to translate it into Perl(it may be in windows.h - I am a poor reader of C and C++ , but I don't have that header file to try).

I hope you find this at least amusing, or even educational, as I have.
mkmcconn

In reply to Windows LFN to 8.3 trivia by mkmcconn

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


The stupid question is the question not asked
	PerlMonks