Re: Comparing depth-indent tree to full-paths

If I understood you correctly, you want to compare the two inputs, not just convert the indent-tree to full paths. This provides a list of missing and extra paths in a single pass. File names allow internal spaces, but no leading or trailing spaces.

It builds on Grandfather and jdporter conversion of tree to full path names, but is a bit more robust about the number of spaces used for indenting and the handling of trailing white space. It also includes directories as well as files when comparing the two lists (I'm assuming that was your intent based on the example you gave).

Caveat 1: the hashes used to mark items as found or extra are being stored in memory. I haven't done much work with very large hashes, so I don't know if you would have problems with a 12,000 entry hash. If so, you might want to tie the hash to a random access file. I noticed that Tie::File::Hash can do this, but I've never used it.

Caveat 2: if the depth indented tree is using a mix of tabs and spaces (as often happens when people type in such lists) then my $iThisIndent = length $sIndent; should be replaced by something like $sIndent =~ s/\t/$sTab/; my $iThisIndent = length $sIndent; where $sTab is the space equivalent of a tab.

Here's the code:

use strict;
use warnings;

#dummy initialization of list of files
#presumably this would be load from a file in real life
#the main thing here is to make sure that the hash is initialized
#so that all values are undefined.
my @aListOfFiles = qw(base1/dir1 base1/dir1/a.png base100/foo.pm);
my %hFound = map { $_ => undef } @aListOfFiles;

my $iLastIndent = 0;
my $iDepth = 0;
my @aPathSegments;
my @aFilePaths;
my %hExtras;


while (my $sLine = <DATA>) {

  #same as grandfather's regex, but
  #(a) strips insignificant trailing whitespace
  #(b) regex for final / is non-capturing
  #    - based on example given by OP, dirs need to be in the
  #      comparison list

  my ($sIndent, $sPathSegment)
    = $sLine =~ m!^(\s*)([^/]*[^\s/])(?:/|\\)?\s*$!;
  next unless defined $sPathSegment and length $sPathSegment;

  #uses jdporter's array idea, but not finicky about number of
  #spaces - only cares whether line is more or less indented
  #than previous line

  #$#dirs = length $indent;
  my $iThisIndent = length $sIndent;
  if ($iLastIndent < $iThisIndent) {
    $iDepth++;
  } elsif ($iLastIndent > $iThisIndent) {
    $iDepth--;
  }
  $iLastIndent=$iThisIndent;
  $aPathSegments[$iDepth] = $sPathSegment;

  my $sFullPath = join('/', @aPathSegments);
  if (exists $hFound{$sFullPath}) {
    #mark path as found
    $hFound{$sFullPath} = 1;
  } else {
    #mark path as extra
    $hExtras{$sFullPath} = 1;
  }
}

print "Missing:\n\t"
  . join("\n\t", grep { ! defined($hFound{$_})} keys %hFound)
  . "\n";
print "Extras:\n\t"
  . join("\n\t", keys %hExtras)
  . "\n";


__DATA__
base1/
 dir1/
    file1.txt
    a.png
 dir2/
 f.txt
base2/
 dir1/
  file1.txt
[download]

Best, beth

Update: added second caveat re tabs.

Update: revised intro to clarify what was being added to previous nodes.

Update: put code in readmore tag

Comment on Re: Comparing depth-indent tree to full-paths Select or Download Code