If I understood you correctly, you want to compare the two inputs, not just convert the indent-tree to full paths. This provides a list of missing and extra paths in a single pass. File names allow internal spaces, but no leading or trailing spaces.

It builds on Grandfather and jdporter conversion of tree to full path names, but is a bit more robust about the number of spaces used for indenting and the handling of trailing white space. It also includes directories as well as files when comparing the two lists (I'm assuming that was your intent based on the example you gave).

Caveat 1: the hashes used to mark items as found or extra are being stored in memory. I haven't done much work with very large hashes, so I don't know if you would have problems with a 12,000 entry hash. If so, you might want to tie the hash to a random access file. I noticed that Tie::File::Hash can do this, but I've never used it.

Caveat 2: if the depth indented tree is using a mix of tabs and spaces (as often happens when people type in such lists) then my $iThisIndent = length $sIndent; should be replaced by something like $sIndent =~ s/\t/$sTab/; my $iThisIndent = length $sIndent; where $sTab is the space equivalent of a tab.

Here's the code:

use strict; use warnings; #dummy initialization of list of files #presumably this would be load from a file in real life #the main thing here is to make sure that the hash is initialized #so that all values are undefined. my @aListOfFiles = qw(base1/dir1 base1/dir1/a.png base100/foo.pm); my %hFound = map { $_ => undef } @aListOfFiles; my $iLastIndent = 0; my $iDepth = 0; my @aPathSegments; my @aFilePaths; my %hExtras; while (my $sLine = <DATA>) { #same as grandfather's regex, but #(a) strips insignificant trailing whitespace #(b) regex for final / is non-capturing # - based on example given by OP, dirs need to be in the # comparison list my ($sIndent, $sPathSegment) = $sLine =~ m!^(\s*)([^/]*[^\s/])(?:/|\\)?\s*$!; next unless defined $sPathSegment and length $sPathSegment; #uses jdporter's array idea, but not finicky about number of #spaces - only cares whether line is more or less indented #than previous line #$#dirs = length $indent; my $iThisIndent = length $sIndent; if ($iLastIndent < $iThisIndent) { $iDepth++; } elsif ($iLastIndent > $iThisIndent) { $iDepth--; } $iLastIndent=$iThisIndent; $aPathSegments[$iDepth] = $sPathSegment; my $sFullPath = join('/', @aPathSegments); if (exists $hFound{$sFullPath}) { #mark path as found $hFound{$sFullPath} = 1; } else { #mark path as extra $hExtras{$sFullPath} = 1; } } print "Missing:\n\t" . join("\n\t", grep { ! defined($hFound{$_})} keys %hFound) . "\n"; print "Extras:\n\t" . join("\n\t", keys %hExtras) . "\n"; __DATA__ base1/ dir1/ file1.txt a.png dir2/ f.txt base2/ dir1/ file1.txt

Best, beth

Update: added second caveat re tabs.

Update: revised intro to clarify what was being added to previous nodes.

Update: put code in readmore tag


In reply to Re: Comparing depth-indent tree to full-paths by ELISHEVA
in thread Comparing depth-indent tree to full-paths by Kage

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.