comment on

Oh wise and benevolent Monks, I beseech thee to help me with this Perl conundrum: I need to retire a 3rd party online document warehouse application, and I'm getting stuck on a specific problem. The data is stored in a hierarchy resembling a Windows file tree, like:

folder
    |--folder2
    |--folder3
    |   |--folderX
    |   |--folderY
    |-folder4
[download]

I need to recreate that hierarchy in a UNIX path structure. There are tools provided by the application to extract the hierarchy, but their information is very segmented. The 2 tools I have are: Folder info: will give me the folder id, name, and number of subfolders under it, like so:

Folder 4464 - foldername_1.
   count subfolders 0.
Folder 4465 - foldername_2.
   count subfolders 0.
Folder 4466 - foldername_3.
   count subfolders 4.
...
[download]

Folder/Folder info: will give me folder ID's for each sub-folder under each folder, like so:

Folder 1298 - foldername_ten.
   subfolder 1299.
   subfolder 1300.
Folder 1299 - foldername_eleven.
    No sub folders.
Folder 1300 - foldername_twelve.
    No sub folders.
Folder 1311 - foldername_thirteen.
   subfolder 1317.
   subfolder 1318.
   subfolder 1958.
[download]

Based on this data, I wrote a script that would first collect the folder ID's and names. For each folder ID, it would then build a UNIX path by searching for the folder's parent folder, and then the parent for that parent folder recursively back to the root folder as shown here:

    # %folders has each folder ID as a key and the name as the value
    # @subfolders is the Folder/Folder data as shown above, line-for-l
+ine

    foreach my $k (sort (keys (%folders))) {
        $folderpaths{$k} = &build_path($k,@subfolders);
        print "$k => $folderpaths{$k}\n";
    }

sub build_path($@) {
    my $folderid = shift @_;
    my @dumpff = @_;
    my $path = "$folderid";
    my $parentid = "";
    foreach my $line (@dumpff) {
        if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) {
            $parentid = $1;
        }
        elsif ($line =~ /\s+subfolder\s+$folderid\s+\-\s+.*\./) {
            $path = join('/', &build_path($parentid,@subfolders),$fold
+erid);
        }
    }
    return $path;
}
[download]

The above code looked like it was working perfectly, but I saw lots of data was missing. I later discovered that there are a few folders that appear as children under multiple parent folders. My script can identify multiple parents for each node it's looking at, but it can only return 1 match. My question is, how can I account for the multiple paths, and how can I identify those multiple paths so I know to make each duplicate path a symlink to the original when I actually build the UNIX filesystem?

In reply to Building a UNIX path from Irritating data by roswell1329

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.