comment on

Hi Monks!
I have a utility which gets a list of different paths and creates an instructions file. It works great but I got into scaling problem when trying to add new features so I'm trying to rethink my strategy. That instructions file is bunch of commands that can be divided in four sections. In the first second it creates the directories using "mkdir -p", in the second section it copies areas into the created directories using "cp -r", in the third sections it copies files using "cp" and in the fourth section it creates links using "ln -s".
Consider the following example:

# Create dirs
mkdir -p $IN/a/b/c
mkdir -p $IN/a/b/d
mkdir -p $IN/x/y/z

# Copy areas
cp -r $OUT/a/b/c/dep_dir1 $IN/a/b/c
cp -r $OUT/a/b/d/dep_dir2 $IN/a/b/d

# Copy files 
cp $OUT/x/y/z/dep_file $IN/x/y/z

# Create links
ln -s $IN/a/b/c $IN/p
[download]

Note that $OUT represents the "outside world" where $IN represents the "inside world". You could think of them as two different directories.

I'm trying to figure a proper data structure that will contain some sort of that filesystem representation (without $IN/$OUT). The idea is that you could take a look at this data structure, and see which dependency is under which area, which path points to another path, etc. My thoughts are pointing me to Graph data structure where each vertex is a directory/file/link/area. If it's a file, then it does not have any dependencies under it. If it's a link then it has one edge to another vertex. If it's a directory then it could point to zero or more vertices. Note that the edge that comes from a directory is different then the edge that comes from a link. Now, area is a special type of vertex so it should have some flag aside it. The definition of an "area" is a directory which user provided in a custom list of dirs. In that case, it will copy the whole area, and not just the files.
Basically I want to be able to do the following queries on the data structure (to create the above example):
1. Insert a link/file/directory into the DS.
2. Get a list of all the directories (for the first sections). For example, I could do something like print $fh "mkdir -p $_" foreach (get_dirs($DS)). 3. Get a list of all the areas (for the second section).
4. Get a list of all the files (for the third section).
5. Get a list of the links (for the fourth section). This part is a bit tricky because one link could be dependent on another one and therefore should be set first.

As I mentioned I already have some initial working script (which is very long and complicated). But the general idea is to split the paths into four hashes (the keys are the path and the values are just the number 1 just to make it easier to see if the path already inside the hash). But This way it is really not scalable. For example, I want to add a new feature where I can use virtual paths instead of logical paths. For example, in the example above, I could replace /p with /a/b/c in all locations.

Since it points me to Graph DS, I was looking at the Graph package. But the problem is that files/dirs are not unique. For example /a/b/c/file1 and /x/y/z/file1 are two completely different files but with the same name. Also, I want to support relative paths in the target of the links such as /a/b/c/d -> ../../x/y/z. How can I represent the filesystem in a proper DS?

EDIT: Just to make it more clear, please consider the following example:

/a/b/c/d/file1
/x/y/z/link1 -> file1
/x/y/w/area1
/u/v -> /a/b/c
[download]

In that case, you get the following graph: https://pasteboard.co/ua4n0o3nwVOY.png (Not sure how to paste image here since img tag is not supported). I was wondering if creating a class called PathsTree that contains one node which represents the root and another class called node. Each time I add a new path, I'll have to split the path by using splitdir and the right node. Does it sound like a valid way?

In reply to How to represent the filesystem in DS? by ovedpo15

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.