Kage has asked for the wisdom of the Perl Monks concerning the following question:

So, I have two similar things I wish to compare: A depth-indented file tree, and a list of full paths. Depth-Indent Tree Example:
base/ dir1/ file1.txt a.png dir2/ b.txt q.txt dir3/ a.png f.txt etc.
File Path Example:
base/dir1/file1.txt base/dir1/a.png base/dir1/dir2/b.txt base/dir1/q.txt etc.
Essentially, I want to take the File Path list, and compare it to the Depth-Indent Tree and find what does not exist in the Depth-Indent tree, but does exist in the File Path list. For example:
base/ dir1/ a.txt
compared to
base/ base/dir1/ base/dir1/a.txt base/b.txt
would yeild that base/b.txt does not exist in the depth-indent tree. The one other stipulation is that this must be fast, and capable of handling ~12,000 entries in both the depth-indent tree and the file path list in a timely manner. Any ideas? Thanks for any help!

Replies are listed 'Best First'.
Re: Comparing depth-indent tree to full-paths
by jdporter (Paladin) on Feb 13, 2009 at 03:47 UTC
    my @tree_full_paths; my @path; for ( @tree ) { /^( *)(.+)/ or next; $#path = length $1; $path[-1] = $2; push @tree_full_paths, join '/', @path; }

    Now just compare the two lists using the usual methods discussed in the Perl FAQ.

    Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.
Re: Comparing depth-indent tree to full-paths
by GrandFather (Saint) on Feb 13, 2009 at 03:50 UTC

    Define 'fast' and/or 'timely manner'. I guarantee the following will complete before lunch time:

    use strict; use warnings; my @dirs; while (my $line = <DATA>) { my ($indent, $entry, $isDir) = $line =~ m!^(\s*)([^/]*)(/|\\)?!; next unless defined $entry and length $entry; my $depth = length ($indent); if ($depth < @dirs) { splice @dirs, $depth, $#dirs; } if ($isDir) { push @dirs, $entry; next; } print join ('/', @dirs, $entry); } __DATA__ base/ dir1/ file1.txt a.png dir2/ b.txt q.txt dir3/ a.png f.txt

    Prints:

    base/dir1/file1.txt base/dir1/a.png base/dir1/dir2/b.txt base/dir1/q.txt base/dir3/a.png base/f.txt

    Update: using jdporter's assign to $#array technique the code reduces to:

    use strict; use warnings; my @dirs; while (my $line = <DATA>) { my ($indent, $entry, $isDir) = $line =~ m!^(\s*)([^/]*)(/|\\)?!; next unless defined $entry and length $entry; $#dirs = length $indent; $dirs[-1] = $entry; next if $isDir; print join '/', @dirs; }

    True laziness is hard work
Re: Comparing depth-indent tree to full-paths
by ELISHEVA (Prior) on Feb 13, 2009 at 07:14 UTC
    If I understood you correctly, you want to compare the two inputs, not just convert the indent-tree to full paths. This provides a list of missing and extra paths in a single pass. File names allow internal spaces, but no leading or trailing spaces.

    It builds on Grandfather and jdporter conversion of tree to full path names, but is a bit more robust about the number of spaces used for indenting and the handling of trailing white space. It also includes directories as well as files when comparing the two lists (I'm assuming that was your intent based on the example you gave).

    Caveat 1: the hashes used to mark items as found or extra are being stored in memory. I haven't done much work with very large hashes, so I don't know if you would have problems with a 12,000 entry hash. If so, you might want to tie the hash to a random access file. I noticed that Tie::File::Hash can do this, but I've never used it.

    Caveat 2: if the depth indented tree is using a mix of tabs and spaces (as often happens when people type in such lists) then my $iThisIndent = length $sIndent; should be replaced by something like $sIndent =~ s/\t/$sTab/; my $iThisIndent = length $sIndent; where $sTab is the space equivalent of a tab.

    Here's the code:

    Best, beth

    Update: added second caveat re tabs.

    Update: revised intro to clarify what was being added to previous nodes.

    Update: put code in readmore tag