Greetings fellow monks,

Over the course of several years, I've collected a great series of programs, code, graphics, MP3s, and other such things that I save off onto CDs from time to time. My "extremely eloquent" naming convention for these is CD1, CD2, and so forth. Naturally, when I'm searching for a specific file I saved off a year or two ago, the quest becomes quite a process.

To reduce the time it takes to locate what I'm searching for, I wanted to make/use a "directory+depth" catalog or indexing script. Essentially, it would look at every directory and file in a volume or directory, parsing through the complete depth, and spit out a report/index of the files with some size and quantity values associated to each directory. I'd print off these reports or save them as a group on the latest CD.

I've been searching around for some module or previously posted code chunk to do this, but I've been unsuccessful. So here's my first attempt to construct a solution. It's certainly messy, and does need some help with its size and construct, but here it is:

#!/usr/bin/perl my $root_dir = '/var/home/gryphon'; my $file_icon = '- '; my $dir_icon = '# '; my $vol_icon = '* '; my $indent_icon = ' '; my $output_file = 'list-of-stuff.txt'; use strict; use File::Find; use File::stat; my (%files, %dirs); find(\&learn_files, $root_dir); open(OUT, "> $output_file"); foreach (sort keys %dirs) { my @indent = split(/\//, substr($_, length($root_dir) + 1)); print OUT $indent_icon x ($#indent + 1); if ($#indent > -1) { print OUT $dir_icon, $indent[$#indent]; } else { print OUT $vol_icon, $_; } print OUT ' (', fix_bytes($dirs{$_}{size} + 0); print OUT ', ', comma($dirs{$_}{files} + 0), ' files'; print OUT ', ', comma($dirs{$_}{subdirs} + 0), ' folders)', "\n"; foreach my $file (sort keys %{$files{$_}}) { print OUT $indent_icon x ($#indent + 2); print OUT $file_icon, $file, ' (', fix_bytes($files{$_}{$file} +), ")\n"; } } close(OUT); sub learn_files { if (-d) { if ($_ ne '.') { $dirs{$File::Find::dir}{subdirs}++; add_up($File::Find::dir); } } else { $dirs{$File::Find::dir}{files}++; my $file_info = stat($File::Find::name); $dirs{$File::Find::dir}{size} += $file_info->size; $files{$File::Find::dir}{$_} = $file_info->size; add_up($File::Find::dir, $file_info->size) if ($File::Find::dir ne $root_dir); } } sub add_up { my $dir = substr($_[0], length($root_dir) + 1); my $curr_dir = $root_dir; foreach (split(/\//, $dir)) { if ($_[1] eq '') { $dirs{$curr_dir}{subdirs}++; } else { $dirs{$curr_dir}{files}++; $dirs{$curr_dir}{size} += $_[1]; } $curr_dir .= "/$_"; } } sub fix_bytes { return comma(int($_[0] / 10737418.24) / 100) . ' GB' if ($_[0] > 1 +073741824); return comma(int($_[0] / 10485.76) / 100) . ' MB' if ($_[0] > 1048 +576); return comma(int($_[0] / 10.24) / 100) . ' KB' if ($_[0] > 1024); return comma($_[0]) . ' bytes'; } sub comma { my $text = reverse $_[0]; $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g; return scalar reverse $text; }

This seems to work properly in both Linux and Win32 enviornments. I'm going to start jazzing up the output so it'll spit out a nice HTML doc with icons and the like. But the above version is the meat of the matter.

Anyone see any glarring problems with it? Anything in there that's not efficient? While the code seems to work fine in general, it seems to take forever to complete on larger directory trees. Any suggestions?

-gryphon
code('Perl') || die;

Replies are listed 'Best First'.
Re: Complete Directory+Depth Listing
by RhetTbull (Curate) on Jul 11, 2001 at 22:50 UTC
    I tried your script and it seems to produce nice output however it's possible to break it. In sub learn files:
    my $file_info = stat($File::Find::name); $dirs{$File::Find::dir}{size} += $file_info->size;
    you don't check the return status of stat. If $File::Find::name is a broken link, stat may fail (it does on my system) and you get the following error:
    Can't call method "size" on an undefined value at ./files.pl line 50.
    As a general rule of thumb, you should always check the return value of things that might fail (which includes almost any call related to the file system). It might also be possible to break this with a race condition where $File::Find::name was deleted prior to the stat.
Re: Complete Directory+Depth Listing
by gryphon (Abbot) on Jul 12, 2001 at 01:02 UTC

    I've played around with the generation of this thing so as to allow it to output to an HTML file a directory listing that's visually similar to Window$ Explorer's TreeView OCX. The page generated requires seven icon graphics for correct display. $file_icon, $dir_icon, and $vol_icon are fairly straight forward.

    The @indents are the icons that represent the lines, crosses, and spaces that makeup the directory outline structure. $indents[0] is a blank space, $indents[1] is a single vertical bar, $indents[2] is the single bar that bends to the right and ends, and $indents[3] is the tri-branched vertical-horizontal bar. (I'd be happy to give these graphics to anyone who wants them.)

    #!/usr/bin/perl my $root_dir = 'D:/'; my $output_file = 'cd-contents.html'; my $file_icon = '<IMG src="icons/tree_file.gif" width=20 height=16 a +lign=top> '; my $dir_icon = '<IMG src="icons/tree_folder_closed.gif" width=20 he +ight=16 align=top> '; my $vol_icon = '<IMG src="icons/tree_folder_open.gif" width=20 heig +ht=16 align=top> '; my @indents = ( '<IMG src="icons/tree_blank.gif" width=20 height=16 align=top>', '<IMG src="icons/tree_line_1.gif" width=20 height=16 align=top>', '<IMG src="icons/tree_line_2.gif" width=20 height=16 align=top>', '<IMG src="icons/tree_line_3.gif" width=20 height=16 align=top>' ); my $html_head = <<ENDOFHTML; <HTML><HEAD><TITLE>Folders & Files: $root_dir</TITLE> <STYLE type="text/css"><!-- BODY { font-family: MS Sans Serif; font-size: 8pt; } H2 { font-family: Arial; font-size: 18pt; } --></STYLE> <SCRIPT language="JavaScript"><!-- window.defaultStatus=document.title; // --></SCRIPT> </HEAD><BODY bgcolor="#f0f0f0"> <H2>Folders & Files: $root_dir</H2> ENDOFHTML my $html_foot = <<ENDOFHTML; </BODY></HTML> ENDOFHTML ###################################################################### use strict; use File::Find; use File::stat; my (%files, %dirs); find(\&learn_files, $root_dir); my @data_output; foreach (sort keys %dirs) { my @indent = split(/\//, substr($_, length($root_dir) + 1)); push @data_output, [$#indent + 1, $vol_icon, $_, '(' . fix_bytes($dirs{$_}{size} + 0) . ', ' . comma($dirs{$_}{files} + 0) . ' files' . ', ' . comma($dirs{$_}{subdirs} + 0) . ' folders)']; if ($#indent > -1) { $data_output[$#data_output][1] = $dir_icon; $data_output[$#data_output][2] = $indent[$#indent]; } foreach my $file (sort keys %{$files{$_}}) { push @data_output, [$#indent + 2, $file_icon, $file, '(' . fix_bytes($files{$_}{$file}) . ')']; } } my @previous_lines; for (my $x = $#data_output; $x > 0; $x--) { for (my $y = 0; $y < $data_output[$x][0] - 1; $y++) { if ($previous_lines[$y]) { $data_output[$x][4] .= $indents[1]; } else { $data_output[$x][4] .= $indents[0]; } } if ($data_output[$x][0] == $data_output[$x+1][0]) { $data_output[$x][4] .= $indents[3]; } elsif ($data_output[$x][0] < $data_output[$x+1][0]) { if ($previous_lines[$data_output[$x][0] - 1]) { $data_output[$x][4] .= $indents[3]; } else { $data_output[$x][4] .= $indents[2]; } $#previous_lines = $data_output[$x][0] - 1; } else { $data_output[$x][4] .= $indents[2]; } $previous_lines[$data_output[$x][0] - 1] = 1; } open(OUT, "> $output_file"); print OUT $html_head, "\n"; foreach (@data_output) { print OUT $_->[4], $_->[1], $_->[2], ' ', $_->[3], "<BR>\n"; } print OUT $html_foot, "\n"; close(OUT); ###################################################################### sub learn_files { if (-d) { if ($_ ne '.') { $dirs{$File::Find::dir}{subdirs}++; add_up($File::Find::dir); } } else { $dirs{$File::Find::dir}{files}++; my $file_info = stat($File::Find::name); $dirs{$File::Find::dir}{size} += $file_info->size; $files{$File::Find::dir}{$_} = $file_info->size; add_up($File::Find::dir, $file_info->size) if ($File::Find::dir ne $root_dir); } } sub add_up { my $dir = substr($_[0], length($root_dir) + 1); my $curr_dir = $root_dir; foreach (split(/\//, $dir)) { if ($_[1] eq '') { $dirs{$curr_dir}{subdirs}++; } else { $dirs{$curr_dir}{files}++; $dirs{$curr_dir}{size} += $_[1]; } $curr_dir .= "/$_"; } } sub fix_bytes { return comma(int($_[0] / 10737418.24) / 100) . ' GB' if ($_[0] > 1 +073741824); return comma(int($_[0] / 10485.76) / 100) . ' MB' if ($_[0] > 1048 +576); return comma(int($_[0] / 10.24) / 100) . ' KB' if ($_[0] > 1024); return comma($_[0]) . ' bytes'; } sub comma { my $text = reverse $_[0]; $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g; return scalar reverse $text; }

    Obviously, there are some serious problems here with speed. I don't like having to iterate through @data_output essentially three times for correct output. But doing it this way was the only way I could build that would incorporate the correct outlining structure. Thoughts? Suggestions?

    -gryphon
    code('Perl') || die;