in reply to Building a UNIX path from Irritating data

I'd build a structure that maps pretty much directly on to the information provided by 'Folder/Folder info'. Along with a little error checking and a recursive sub to generate the output something like the following should fit the bill:

use strict; use warnings; my %folders; my $parentId; while (<DATA>) { chomp; if (m/^Folder\s+(\d+)\s*-\s*(.*)\.$/) { # new folder my ($folderId, $folderName) = ($1, $2); die "Duplicate entry for $folderId ($folderName) at line $.\n" if ++$folders{$folderId}{idCount} > 1; # Create the new folder $folders{$folderId}{folders} = []; $folders{$folderId}{name} = $folderName; $parentId = $folderId; } elsif (m/^\s+subfolder (\d+)\.$/) { # New subfolder my $folderId = $1; die "No parent folder for $folderId. Data format error at line + $.\n" if ! defined $parentId; warn "Probably bad data. Subfolder $folderId has already been +seen\n" if exists $folders{$folderId}; push @{$folders{$parentId}{folders}}, $folderId; $folders{$folderId}{hasParent} = 1; } else { # Bogus line } } genPath ($_) for sort grep {! $folders{$_}{hasParent}} keys %folders; sub genPath { my ($folderId, $root) = @_; $root .= '/'; die "Missing folder information for $folderId\n" if ! exists $folders{$folderId}{name}; die "Cycle in 'tree' involving $folderId and path $root\n" if ++$folders{$folderId}{visits} > 1; $root .= $folders{$folderId}{name}; print "$root\n"; genPath ($_, $root) for @{$folders{$folderId}{folders}}; } __DATA__ Folder 1298 - foldername_ten. subfolder 1299. subfolder 1300. Folder 1299 - foldername_eleven. No sub folders. Folder 1300 - foldername_twelve. No sub folders. Folder 1311 - foldername_thirteen. subfolder 1317. subfolder 1318. Folder 1317 - foldername_twelve. No sub folders. Folder 1318 - foldername_twelve. No sub folders.

Prints:

/foldername_ten /foldername_ten/foldername_eleven /foldername_ten/foldername_twelve /foldername_thirteen /foldername_thirteen/foldername_twelve /foldername_thirteen/foldername_twelve

True laziness is hard work

Replies are listed 'Best First'.
Re^2: Building a UNIX path from Irritating data
by roswell1329 (Acolyte) on Nov 26, 2009 at 00:42 UTC
    Hi GrandFather -- Thanks for the suggestion. I tried your code and it works as you mentioned, but I tried the same code with a static data file from the application itself (I posted a copy here: subfolders.txt), and the list never got past the first "/foldername". I'm looking into that to find out why.

    However, I also don't see in your code how deep the paths could go. Some of these folder/subfolder relationships go down 5 or 6 levels so we could see something like this:

    /foldername_ten/foldername_eleven/foldername_twelve/foldername_thirteen/foldername_fourteen

    It looks like your code only handles the first level of subfolder. Is that correct?

    For an example of how this data could be nested, search for folder ID 3053 in the linked datafile above and tell me if your code could address the structure seen there. Thank you for your assistance!

      There is no inherent limit to how nested the folders may be.

      I tried your data and it seemed to work fine for me generating over 3000 lines of output. Could you have a line ending issue? If you are running the script on *nix, but the file is generated on Windows then the line ends generated (cr/lf) won't match those expected (lf).

      I notice that some of the names have / characters in them (Folder 7969 for example). That is likely to cause you grief if you use the names as folder names.

      It looks to me like you really need a database solution to this problem instead of a file based solution.


      True laziness is hard work