resistance has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am working with Windows file system from Linux and trying to replace 1000 of "find" operation with quick search in hash %filesystem, which I build only once during start of my script.

#!/usr/bin/perl -w use strict; use Data::Dumper; my $mountpoint = '/media/sdb2'; my %filesystem; $filesystem{$mountpoint} = traverse($mountpoint); sub traverse{ my %hash; my ($folder) = $_[0]; opendir DIR,$folder; my @content = grep {!/^\.{1,2}$/} readdir DIR; closedir DIR; foreach my $entry(sort @content){ if(-f $folder."/".$entry){ $hash{lc $entry} = $entry; }elsif( -d $folder."/". $entry){ $hash{$entry}= traverse($folder."/".$entry); }else{ print $folder."/".$entry." ignoring\n"; }; }; return %hash; }; print Dumper %filesystem;


1. The code still dont work, I am working on it right now, but you understand what I mean. I am trying to represent any folder as subkey of hash.
2. Because in Windows file naming is case insensitive and in Linux case sensitive, I set key to low case and leave value unchanged.
3. I found Tie::FileSystem module on CPAN, but it do much more then I need - read the content of the file in memory, but I need the file structure only.
4. Ok, say I build the %filesystem hash and now I am trying to find file /media/sdb2/folder1/DIRECTORY/file.TXT. The query can return two values: the path (case sensitive) to the file or undef if the file not exists. How to get it from my hash if case of data in hash are differ from asked ?
Example:
I launch my script and read the file system, populating %filesystem hash, so I have $filesystem{'/media/sdb2'}{'Folder1'}{'Directory'}{'File.txt'}.
Then I read some value from Windows registry, this value is folder1/directory/file.txt (the casing is incorrect, its ok in Windows, but not ok on Linux).
Is it possible to get right file location with "exists" from %filesystem hash? Example:
$hash{aaa}{bbb}{ccc}="ccC"; if(exists $hash{lc aAa}{lc bBB}{lc CCC}){ print "exists: $hash{lc aAa}{lc bBB}{lc CCC}\n"; }else{ print "not exists\n"; };
this example return "exists ccC", but how to get right case of "bbb" folder? I want to retrieve it without to read the filesystem again.
Thanks

Replies are listed 'Best First'.
Re: threating the filesystem structure as hash of hashes
by massa (Hermit) on Jul 13, 2008 at 12:54 UTC
    My suggestions:
    1. File::Find probably will be faster/better in traversing th filesystem; and
    2. just save the full pathname in the hash values instead of the filename only
    []s, HTH, Massa
      Also, unless you are the *only* one using the file system, you run the risk of having an out of date directory structure. Imagine what will happen in your program if someone else (network share?) deletes a directory or file after you have built your hash.
Re: treating the filesystem structure as hash of hashes
by BrowserUk (Patriarch) on Jul 13, 2008 at 18:27 UTC

    You shouldn't bother either building a hash structure or using File::Find.

    If you have a bunch of paths like /media/sdb2/folder1/directory/file.txt, it'll be far quicker to just interrogate the file system directly:

    my $path = '/media/sdb2/folder1/directory/file.txt'; print $path, -e( $path ) ? ' exists' : " doesn't exist";

    To build the hash structure will take one filesystem access for every path element in the tree, including all those that you don't look up later. Likewise using File::Find.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: threating the filesystem structure as hash of hashes
by jethro (Monsignor) on Jul 13, 2008 at 12:41 UTC
    Use lc to store the keys to the hashes in lowercase. So that you have $filesystem{'/media/sdb2'}{'folder1'}{'directory'}{'file.txt'}.

    Simply change

    foreach my $entry (@content){ $entry = lc $entry
    I threw out the 'sort', it has no use when you put things into a hash.

    Your data structure could be much simpler by using the complete path with filename as key. So that $filesystem{'/media/sdb2'}{'folder1'}{'directory'}{'file.txt'} would be stored in something like $filesystem{'folder1\\directory\\file.txt'} instead.

    EDIT: a few edits because I misread the problem

Re: threating the filesystem structure as hash of hashes
by pc88mxer (Vicar) on Jul 13, 2008 at 13:51 UTC
    Try mounting your file system with the option posix=0. From the mount(8) man page:
    posix=[0|1] If enabled (posix=1), the file system distinguishes between upper and lower case. The 8.3 alias names are presented as hard links instead of being suppressed.
    Also see the check option under "Mount options for FAT".