in reply to Comparing depth-indent tree to full-paths
It builds on Grandfather and jdporter conversion of tree to full path names, but is a bit more robust about the number of spaces used for indenting and the handling of trailing white space. It also includes directories as well as files when comparing the two lists (I'm assuming that was your intent based on the example you gave).
Caveat 1: the hashes used to mark items as found or extra are being stored in memory. I haven't done much work with very large hashes, so I don't know if you would have problems with a 12,000 entry hash. If so, you might want to tie the hash to a random access file. I noticed that Tie::File::Hash can do this, but I've never used it.
Caveat 2: if the depth indented tree is using a mix of tabs and spaces (as often happens when people type in such lists) then my $iThisIndent = length $sIndent; should be replaced by something like $sIndent =~ s/\t/$sTab/; my $iThisIndent = length $sIndent; where $sTab is the space equivalent of a tab.
Here's the code:
use strict; use warnings; #dummy initialization of list of files #presumably this would be load from a file in real life #the main thing here is to make sure that the hash is initialized #so that all values are undefined. my @aListOfFiles = qw(base1/dir1 base1/dir1/a.png base100/foo.pm); my %hFound = map { $_ => undef } @aListOfFiles; my $iLastIndent = 0; my $iDepth = 0; my @aPathSegments; my @aFilePaths; my %hExtras; while (my $sLine = <DATA>) { #same as grandfather's regex, but #(a) strips insignificant trailing whitespace #(b) regex for final / is non-capturing # - based on example given by OP, dirs need to be in the # comparison list my ($sIndent, $sPathSegment) = $sLine =~ m!^(\s*)([^/]*[^\s/])(?:/|\\)?\s*$!; next unless defined $sPathSegment and length $sPathSegment; #uses jdporter's array idea, but not finicky about number of #spaces - only cares whether line is more or less indented #than previous line #$#dirs = length $indent; my $iThisIndent = length $sIndent; if ($iLastIndent < $iThisIndent) { $iDepth++; } elsif ($iLastIndent > $iThisIndent) { $iDepth--; } $iLastIndent=$iThisIndent; $aPathSegments[$iDepth] = $sPathSegment; my $sFullPath = join('/', @aPathSegments); if (exists $hFound{$sFullPath}) { #mark path as found $hFound{$sFullPath} = 1; } else { #mark path as extra $hExtras{$sFullPath} = 1; } } print "Missing:\n\t" . join("\n\t", grep { ! defined($hFound{$_})} keys %hFound) . "\n"; print "Extras:\n\t" . join("\n\t", keys %hExtras) . "\n"; __DATA__ base1/ dir1/ file1.txt a.png dir2/ f.txt base2/ dir1/ file1.txt
Best, beth
Update: added second caveat re tabs.
Update: revised intro to clarify what was being added to previous nodes.
Update: put code in readmore tag
|
|---|