Preceptor has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of Unix directories, that I'm needing to perform an operation on. (In this case, a du -sk).

The problem is, this list was generated from a Windows view of the relevant filesystem (It's a NAS) and so the list is not case sensitive. (eg. I have /fs008/Localusers rather than /fs008/LocalUsers)

My question is, is there a simple way to 'convert' the filename that's got the right sequence of characters, but not quite the same case, to the 'correct' filename?

I can'd do anything as simple as converting it all to upper/lower case, because for legacy reasons, our Unix filesystems do include a mix of upper and lower case directories.

I'm fairly confident that there are going to be no 'duplicates' (eg. Having "thisfile" and "ThisFile" in the same directory) since these filesystems are exported via both CIFS and NFS.

The only way I can immediately think of involves generating all possible permutations of case, and testing for the existence of the file.

(Oh, and the filenames have spaces in too, in places. But that's less of a problem)

Replies are listed 'Best First'.
Re: case insensitive filename matching
by cog (Parson) on Feb 10, 2005 at 12:35 UTC
    Get the list of all files in the filesystem, get the list of the files you want, and use grep:

    grep { is_in(@mylist) } @files_in_the_filesystem

    That'll give you the list of files you want (hopefully).

    Of course, you'll have to code the is_in subroutine, which will probably contain a bit of code like this:

    $newfile =~ /^$oldfile$/i

    I hope that sets you in the right direction. Honk if you need more help :-)

      $newfile =~ /^$oldfile$/i

      As much as I like regexps, I would be much more likely to write that as:

      if ( lc $newfile eq lc $oldfile )

      In my eyes, that's the canonical case-insensitive equality comparision.

      --
      edan

        Yes, you're right :-)

        (what was I thinking? :-) )

Re: case insensitive filename matching
by Fletch (Bishop) on Feb 10, 2005 at 13:03 UTC
    use File::Find qw( find ); my @desired_files = `cat list_o_dirs`; my %desired_files; @desired_files{ map lc, @desired_files } = (1) x @desired_files; my @hits; find( $root_dir, sub { push @hits, $File::Find::name if exists $desired_files{ lc $File +::Find::name } );
Re: case insensitive filename matching
by holli (Abbot) on Feb 10, 2005 at 13:49 UTC
    This maybe not as short as the former, but i hope itīs clearer. The code creates a tree from the filesystem, that is used to lookup directory names.

    Assuming this fs-structure:
    test ->trEE ->/A ->/b
    this code:
    use strict; use warnings; # hash for file-system data my %tree; #build a datastructure that represents directory tree #for everything below "/test" build_tree ("/test", \%tree); # get realpath of "c:/test/tree/a/b" my $p = getRealPath ("TREE/A/B", \%tree); print $p; sub build_tree { my $dir = shift; my $ref = shift; my $dirh = shift; opendir $dirh, $dir or die $!; while ( $_ = readdir ($dirh) ) { next if /^\./; next unless -d "$dir/$_"; $ref->{subdirs}->{lc($_)} = { name=>$_, subdirs=>{} }; build_tree ("$dir/$_", $ref->{subdirs}->{lc($_)}); } closedir $dirh; } sub getRealPath { my @path = split ("/", shift); my $ref = shift; my $path = ""; for ( @path ) { $path .= "/" . $ref->{subdirs}->{lc($_)}->{name}; $ref = $ref->{subdirs}->{lc($_)}; } return $path; }
    prints:
    test/trEE/A/B
    holli, /regexed monk/
      Thanks, that worked beautifully.

      Of course, now I've got to figure out how to deal with people putting horrible things like ampersands into their filenames, but that I'm sure I can deal with.

Re: case insensitive filename matching
by RazorbladeBidet (Friar) on Feb 10, 2005 at 13:13 UTC
    Besides "finding" the list again or trying all permutations, you may want to simply

    while (<FILE>) { $real_filename = `ls -l | grep -i $_`; }


    or something similar - however it may be more beneficial to group files in the same directory together (if they are not already) to reduce the number of system calls.