venrii has asked for the wisdom of the Perl Monks concerning the following question:

I've inherited some code...
A hash is read in with the following key format:
$hash{/path/to/dir1}{dir} = number of directories in dir1 (4 for example)
$hash{/path/to/dir1/sub1}{dir} = number of directories in sub1 (3 for example)
$hash{/path/to/dir2}{dir} = number of directories in dir2 (12 for example)

My customer now wants to see $hash{/path/to/dir1}{dir} contain the total number of directories underneath it (7 using example numbers), not just in the /path/to/dir1 directory. Data::Dumper has been used to store previous runs which gathered all this information. Anyone with suggestions on how best to solve this? I'm fairly comfortable with Perl for my basic admin scripting needs, but this reeks of something like recursion. Any help would be greatly appreciated.

Update

Here is the code that generates the current output...
use strict; use File::Find; use Fcntl ':mode'; my $cur_dir; my %stats; my ($files, $dirs, $links, $tfiles, $tdirs, $tlinks); my ($mode, $user, $group); my $depth = split('/', $ARGV[0]) + $ARGV[1]; find ( \&user_files, $ARGV[0]); print "Files: $tfiles\n"; print "Dirs: $tdirs\n"; print "Links: $tlinks\n"; printf "%-40s %-10s %-10s %-10s\n", 'Directory', 'Dirs', 'Files', 'Lin +ks'; print '-' x 70, "\n"; for ( sort keys %stats ) { $stats{$_}{dirs}-- if ( $_ eq $ARGV[0] ); my $tdirs = int($stats{$_}{tdirs}); my $tfiles = int($stats{$_}{tfiles}); my $tlinks = int($stats{$_}{tlinks}); my $dirs = int($stats{$_}{dirs}); my $files = int($stats{$_}{files}); my $links = int($stats{$_}{links}); printf "%-40s %-10s %-10s %-10s\n", $_, "$tdirs/$dirs", "$tfiles/$f +iles", "$tlinks/$lin ks"; } exit(0); #------------------------------ Sub-routines ------------------------- +--------- sub user_files { if ( $File::Find::name =~ /\/\.snapshot/ ) { $File::Find::prune = 1; return; } ($mode, $user, $group) = (stat($File::Find::name))[2,4,5]; $cur_dir = ''; my $count = 0; for my $d ( split('/', $File::Find::dir) ) { $count++; next if ( $d eq '' ); $cur_dir .= "/$d"; last if ( $count == $depth ); } if ( -f $File::Find::name ) { $tfiles++; $stats{$cur_dir}{tfiles}++; $stats{$cur_dir}{files}++ if ( $mode & S_IXOTH ); } elsif ( -d $File::Find::name ) { $tdirs++; $stats{$cur_dir}{tdirs}++; $stats{$cur_dir}{dirs}++ if ( $mode & S_IXOTH ); } elsif ( -l $File::Find::name ) { $tlinks++; $stats{$cur_dir}{tlinks}++; $stats{$cur_dir}{links}++ if ( $mode & S_IXOTH ); } }

Again, not my code and it may make more sense to start from scratch, but the goal is to change the output from:

Directory Dirs Files Links /path/to/dir1 10/8 12/10 0/0 /path/to/dir1/sub1 7/7 5/3 0/0 /path/to/dir1/sub1/a1 20/20 50/30 0/0 to this format Directory Dirs Files Links /path/to/dir1 37/35 67/43 0/0 /path/to/dir1/sub1 27/27 55/33 0/0 /path/to/dir1/sub1/a1 20/20 50/30 0/0
I hope that makes more sense...Thanks for the comments so far...they are giving me ideas on how to approach the new requirements differently.

Replies are listed 'Best First'.
Re: Hash problem (possible recursion)
by ikegami (Patriarch) on Jun 14, 2006 at 21:41 UTC
    I think the following will do the trick:
    foreach my $dir (sort { length($b) <=> length($a) } keys %hash) { next unless $dir =~ m{^(.*)/[^/]*$}; my $parent = $1; next unless exists $hash{$parent}; $hash{$parent}{dir} += $hash{$dir}{dir}; }

    The sort is essential in propagating the changes towards the root.

Re: Hash problem (possible recursion)
by graff (Chancellor) on Jun 15, 2006 at 03:38 UTC
    My customer now wants to see ... the total number of directories ..., not just ...

    Whenever I see that kind of feature shift, I worry. Is it the case that the customer really does not want the original information anymore, and only wants the modified information? Maybe the new spec was meant to supplement the existing data, instead of replacing it -- i.e. something like:

    $hash{/path/to/dir1}{dir} = number of directories in dir1
    $hash{/path/to/dir1}{sum} = sum of all subdirectories under dir1
    ...

    The point is that the original information might still be useful; hopefully, the customer can be clear about this.

    When you say you "read in" the hash, it might make things easy if we knew what you were reading from (a list file? output from "find /path -type d"? File::Find?). Rather than having the hash loaded one way and then retooled to be another way, you should be able to load it and have it be the way you want in one pass -- use a technique similar to what ikegami proposed above, but do it while the hash is being loaded.

    UPDATE: In case it helps, here's a sample that reads from a "find -d" command, and tabulates the both your original stats (number of directories immediately contained in each directory) and the new stats (total number of subdirectories subsumed under each directory):

    #!/usr/bin/perl use strict; die "Usage: $0 [path ...]\n" if ( @ARGV and not -d $ARGV[0] ); push @ARGV, "." if @ARGV == 0; my %hash; $/ = "\0"; open( FIND, "-|", "find", @ARGV, qw/-type d -print0/ ) or die "can't run find @ARGV: !$\n"; while ( <FIND> ) { chomp; # $hash{$_}{dir} = 0; next unless ( s{/[^/]+$}{} ); $hash{$_}{dir}++; $hash{$_}{sum}++; while ( s{/[^/]+$}{} ) { $hash{$_}{sum}++; } } printf "%6s %8s %s\n", qw/dirs subdirs path/; for ( sort keys %hash ) { printf "%6d %8d %s\n", $hash{$_}{dir}, $hash{$_}{sum}, $_; }
    Note the line after "chomp" that is commented out: if you uncomment that line, the output will list all directories, including the "terminals" (those having no further subdirectories, yielding zeros for "dir" and "sum"); by commenting it out, the listing only contains the "non-terminal" directories.

    (updated code to remove a redundant use of m{})

Re: Hash problem (possible recursion)
by GrandFather (Saint) on Jun 14, 2006 at 21:58 UTC

    Without a sample of your code and small sample of typical data it's hard to give a detailed answer that may be useful to you. However Directory tree explorer with stats reporting performs the task you are after solving and may help.


    DWIM is Perl's answer to Gödel