ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
I'm using the Graph module to create a graph which represents the filesystem. Each vertex contains the canonical path. They also contain an attribute "type" with their type (file, dir or link).
Given a path, I fill the graph like so:
sub add_path { my ($self,$path) = @_; my $canonical_path = get_canonical_path($path); my @subpaths = splitdir($canonical_path); for my $index (0..$#subpaths) { next if (@subpaths[$index] eq ""); # Ignore empty strings my $parent_path = abs_path(catdir(@subpaths[0..($index-1)])); my $current_path = catdir($parent_path,@subpaths[$index]); my $original_path_flag = ($index eq $#subpaths); next if ($graph->has_vertex($current_path)); if (-l $current_path) { my @resloved_links = resolve_symlink($current_path); my $target_path = $resloved_links[1]; $self->add_path($target_path); $graph->add_edge($parent_path,$current_path); $graph->add_edge($current_path,$target_path); $graph->set_vertex_attributes($current_path, { "type" => " +link", "target" => $target_path, "original" => $original_path_flag }) +; } elsif (-f $current_path) { $graph->add_edge($parent_path,$current_path); $graph->set_vertex_attributes($current_path, { "type" => " +file", "original" => $original_path_flag }); } elsif (-d $current_path) { $graph->add_edge($parent_path,$current_path); $graph->set_vertex_attributes($current_path, { "type" => " +dir", "original" => $original_path_flag }); } } }
Basically I split find the canonical path of the given path, split it and create a vertex of each subpath, with an edge to the next one. The $original_path_flag represents full paths (you will understand later why I need it).
So you get a graph with vertices - some are dirs, some are files and some are links. If it's a dir, then it has zero or more "children", if it's a file, then it does not have "children" and if it's a link, then it has only one child. But note, that all the vertices contain canonical paths, without links in the way. Now, once I fill the graph, I want to mark *original* paths that match one of the regex expressions, with keyword "mark". My current code:
sub run_rexes { my ($self,$rexes_aref) = @_; foreach my $vertex ($graph->unique_vertices) { next unless ($graph->get_vertex_attribute($vertex, 'original') +); if (is_regex_matches($vertex,$rexes_aref)) { $graph->set_vertex_attribute($vertex, "marked", 1); } } }
Where is_regex_matches returns true if $vertex matches one of the regex expressions. It works good but I want to add support to links in the regex expressions. Since $graph->unique_vertices returns the vertices and each one contains only the absolute paths, then running a regex with link on the way, will fail. Just to make it more clear, consider:
/a/b/c/file /p -> /a/b/c
So a regex like so: ^/a/b/c/file$ will work, but ^/p/file$ will not work.
Is it possible to suggest a solution for it?

Replies are listed 'Best First'.
Re: Graph which represents the filesystem
by kcott (Archbishop) on Apr 18, 2022 at 18:59 UTC

    G'day ovedpo15,

    Your post is missing a lot of information. For example:

    • run_rexes - not called anywhere.
    • $rexes_aref - not defined anywhere.
    • is_regex_matches() - not defined in your OP or Graph.

    This makes it very difficult to give you an answer. It would have been much better if you had provided an SSCCE.

    From what you've written, I get the impression that you're not experiencing problems with Graph; your issue seems to be how to collect the data to populate the graph. The following script collects the data you've indicated you need. It is barebones but does the job assuming a Unix-like OS; you may want to use other modules (e.g. File::Find, File::Spec, etc.) but I'll leave that entirely up to you.

    #!/usr/bin/env perl use strict; use warnings; use autodie; use constant BASE_DIR => '/home/ken/tmp/pm_11143050'; use constant { LEN_BASE_DIR => length BASE_DIR, FS_DIR => BASE_DIR . '/fs', }; use Cwd 'abs_path'; use Data::Dump; my %fs_map; walk_fs(FS_DIR, \%fs_map); dd \%fs_map; sub walk_fs { my ($path, $fs_map) = @_; opendir(my $dh, $path); while (readdir $dh) { next if /^\.{1,2}$/; my $entry = "$path/$_"; my $fs_path = fs_path($entry); if (-f $entry) { $fs_map->{$fs_path} = [file => []]; } elsif (-l $entry) { $fs_map->{$fs_path} = [link => [canon_path($entry)]]; } elsif (-d $entry) { if (exists $fs_map->{fs_path($path)}) { push @{$fs_map->{fs_path($path)}[1]}, canon_path($entr +y); } $fs_map->{$fs_path} = [dir => []]; walk_fs($entry, $fs_map); } else { warn "IGNORED '$entry': not a plain file, link or director +y.\n"; } } closedir $dh; return; } sub fs_path { return substr $_[0], LEN_BASE_DIR; } sub canon_path { return substr abs_path($_[0]), LEN_BASE_DIR; }

    I created the following directory structure for testing. It contains normal files, directories, subdirectories, links, links to links, and even a fifo.

    ken@titan ~/tmp/pm_11143050 $ ls -lR fs fs: total 0 drwxr-xr-x+ 1 ken None 0 Apr 19 03:49 a drwxr-xr-x+ 1 ken None 0 Apr 19 02:46 t fs/a: total 1 prw-rw-rw- 1 ken None 0 Apr 19 03:49 a_named_pipe drwxr-xr-x+ 1 ken None 0 Apr 19 02:46 b -rw-r--r-- 1 ken None 0 Apr 19 02:36 d lrwxrwxrwx 1 ken None 5 Apr 19 02:42 r -> b/c/z drwxr-xr-x+ 1 ken None 0 Apr 19 02:46 u lrwxrwxrwx 1 ken None 8 Apr 19 02:35 x -> ../a/b/c fs/a/b: total 0 drwxr-xr-x+ 1 ken None 0 Apr 19 02:46 c -rw-r--r-- 1 ken None 0 Apr 19 02:37 e lrwxrwxrwx 1 ken None 1 Apr 19 02:40 q -> y drwxr-xr-x+ 1 ken None 0 Apr 19 02:46 v lrwxrwxrwx 1 ken None 1 Apr 19 02:37 y -> . fs/a/b/c: total 0 -rw-r--r-- 1 ken None 0 Apr 19 02:38 f lrwxrwxrwx 1 ken None 7 Apr 19 02:40 p -> ../../x drwxr-xr-x+ 1 ken None 0 Apr 19 02:46 w lrwxrwxrwx 1 ken None 6 Apr 19 02:38 z -> ../../ fs/a/b/c/w: total 0 fs/a/b/v: total 0 fs/a/u: total 0 fs/t: total 0

    Here's the output:

    ken@titan ~/tmp/pm_11143050 $ ./mapfs.pl IGNORED '/home/ken/tmp/pm_11143050/fs/a/a_named_pipe': not a plain fil +e, link or directory. { "/fs/a" => ["dir", ["/fs/a/b", "/fs/a/u"]], "/fs/a/b" => ["dir", ["/fs/a/b/c", "/fs/a/b/v"]], "/fs/a/b/c" => ["dir", ["/fs/a/b/c/w"]], "/fs/a/b/c/f" => ["file", []], "/fs/a/b/c/p" => ["link", ["/fs/a/b/c"]], "/fs/a/b/c/w" => ["dir", []], "/fs/a/b/c/z" => ["link", ["/fs/a"]], "/fs/a/b/e" => ["file", []], "/fs/a/b/q" => ["link", ["/fs/a/b"]], "/fs/a/b/v" => ["dir", []], "/fs/a/b/y" => ["link", ["/fs/a/b"]], "/fs/a/d" => ["file", []], "/fs/a/r" => ["link", ["/fs/a"]], "/fs/a/u" => ["dir", []], "/fs/a/x" => ["link", ["/fs/a/b/c"]], "/fs/t" => ["dir", []], }

    So, %fs_map holds all of the data that you indicated you need. I'll leave you to use this to populate your graph.

    — Ken