Oh wise and benevolent Monks, I beseech thee to help me with this Perl conundrum: I need to retire a 3rd party online document warehouse application, and I'm getting stuck on a specific problem. The data is stored in a hierarchy resembling a Windows file tree, like:
folder |--folder2 |--folder3 | |--folderX | |--folderY |-folder4
I need to recreate that hierarchy in a UNIX path structure. There are tools provided by the application to extract the hierarchy, but their information is very segmented. The 2 tools I have are: Folder info: will give me the folder id, name, and number of subfolders under it, like so:
Folder 4464 - foldername_1. count subfolders 0. Folder 4465 - foldername_2. count subfolders 0. Folder 4466 - foldername_3. count subfolders 4. ...
Folder/Folder info: will give me folder ID's for each sub-folder under each folder, like so:
Folder 1298 - foldername_ten. subfolder 1299. subfolder 1300. Folder 1299 - foldername_eleven. No sub folders. Folder 1300 - foldername_twelve. No sub folders. Folder 1311 - foldername_thirteen. subfolder 1317. subfolder 1318. subfolder 1958.
Based on this data, I wrote a script that would first collect the folder ID's and names. For each folder ID, it would then build a UNIX path by searching for the folder's parent folder, and then the parent for that parent folder recursively back to the root folder as shown here:
# %folders has each folder ID as a key and the name as the value # @subfolders is the Folder/Folder data as shown above, line-for-l +ine foreach my $k (sort (keys (%folders))) { $folderpaths{$k} = &build_path($k,@subfolders); print "$k => $folderpaths{$k}\n"; } sub build_path($@) { my $folderid = shift @_; my @dumpff = @_; my $path = "$folderid"; my $parentid = ""; foreach my $line (@dumpff) { if ($line =~ /^Folder (\d{3,5})\s+\-\s+.*\./) { $parentid = $1; } elsif ($line =~ /\s+subfolder\s+$folderid\s+\-\s+.*\./) { $path = join('/', &build_path($parentid,@subfolders),$fold +erid); } } return $path; }
The above code looked like it was working perfectly, but I saw lots of data was missing. I later discovered that there are a few folders that appear as children under multiple parent folders. My script can identify multiple parents for each node it's looking at, but it can only return 1 match. My question is, how can I account for the multiple paths, and how can I identify those multiple paths so I know to make each duplicate path a symlink to the original when I actually build the UNIX filesystem?

In reply to Building a UNIX path from Irritating data by roswell1329

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.