in reply to Permissions Utility Script for sysadmins - seeking comments

(I should stress before I start that I don't find this code horrible, but you asked for any constructive criticism. :)

die "Usage: permmy -[itcad] [directory] -f config-file\n" unless @ARGV >= 3; use vars qw/ $opt_t $opt_c $opt_d $opt_i $opt_f $opt_a/; getopts('tcdi:f:a:') or die "Usable options are -tcifa: $!\n";

Hmm, nothing explains what the options mean; the usage message implies that you can supply just 2 arguments but the code requires 3; the second die() gives the wrong list of supported arguments.

I prefix most of my scripts with a usage() routine something like:

sub usage { my $text = <<USAGE; $0 [ -i | -c | -t ] directory [, directory ... ] Frobnicate all the files in the specified directory/ies -t test only, and warn of discrepancies -i inclusive: unknown files ignored -c conclusive: unknown files deleted (default is to defrobnicate unknown files) USAGE die "$text\n@_\n" if @_; print $text; exit 0; }

I then call usage() without parameters if I see -h or -? options, and with an error message as a parameter if for some other reason the arguments were not acceptable. As well as being useful for the user, having this usage statement at the beginning of the script can be very helpful for the maintainer - these days I tend to write it first, and use it as a guide for development of the rest of the script.

if ($opt_t || $opt_d || $opt_c) { ... } [ snip 11 subroutines covering some 180 lines ] if ($opt_d) { ... }

I find this layout confusing; I prefer instead to keep the main code at the top together and with a clear

exit 0; # --- subroutines
(or something) separating the main code from the subroutine declarations - it is quite difficult, either by eye or programmatically, to find the next active line of code when it is so far away. If the main code starts to become long (eg more than a screenful) I'll move stuff into otherwise unnecessary subroutines to minimise that, so that I can end up with:
if ($opt_d) { configtest(); } elsif ($opt_i) { configurator("i", $perms_config_file, $dir_tree, \&wanted_ia); } elsif ($opt_a) { configurator("a", $perms_config_file, $add_tree, \&wanted_ia); } elsif ($opt_t) { configtest(); fulltest(); } elsif ($opt_c) { configtest(); changeperms(); # or whatever this actually does }
since when maintaining it 6 months later, I usually want the main code to give me a clear overview of the structure, to make it as easy as possible to drill down to the actual bit of code I'm looking for.

Sideroad: I try to name subroutines appropriately to help me think with them, and part of that is getting the right part of speech - if a subroutine returns TRUE when the argument is valid I'll call it valid or is_valid and use it like:

if (valid($foo)) { ... }
or, more likely:
if ($obj->valid) { ... }
whereas if it does its validity checks but doesn't return a result I'll name it imperatively:
validate($foo); # dies if invalid

sub configurator ($$$$) { ... exit; close CONFIG or die "Could not close config file $conf_file: $!\n" +; }

That looks wrong - perhaps the exit was a temporary diagnostic? I tend to put such things in the first column to highlight their (intended) temporary nature.

Sideroad: I assume there is documentation somewhere for the format of the config file.

my $nice_mode = &return_mode($mode);

The & is not needed there, and it makes me do a double-take - usually I only expect to see the & specifed when taking a reference to a subroutine.

sub change_mode ($$) { ...(oct($change_value) != oct($c_nice_mode))... }

Hmm, one day someone'll call change_mode($file, 0777) and everything will go horribly wrong. We have the underlying mode (a number) and the common external representation (an octal string) - I would normally prefer to use the underlying value throughout in the code, and convert to the external representation only when it is needed for output. This also makes things more consistent with perl builtins such as mkdir or sysopen which take a raw number for their mode arguments.

chown($num_uid, -1, $the_file) or warn "Could not change owner + of $the_file: $!\n"; print "Changed owner of $the_file from $c_pwuid to $change_val +ue\n";

After the warning is printed, the success message is printed anyway. Two common alternatives are:

if (chown(...)) { print "success"; } else { warn "failure"; }
or:
chown(...) or do { warn "failure"; return }; print ...

if ($testy eq "dir-mode") { $d_mode = $config{$tes +t}{$tester}{$testy} } elsif ($testy eq "file-mode") { $f_mode = $config{ +$test}{$tester}{$testy} } ...

A string of tests like this immediately makes me think "use a hash". In this case, I think you want to end up with a hunk of code to replace sub wanted_c, by specifying a code block explicitly for each hash value or by encoding parameters to either build up some evalled code or to bind to a closure.

I note also that if the tests specify (say) both 'dir-mode' and 'dir-owner', the latter will be silently ignored: in such cases it is helpful to test and warn. You also need to be careful about booleans - I guess a mode of 000 is unlikely, but if you were to make a natural extension to allow owners to be specified by uid, an attempt to specify root as uid 0 would fail, since the

elsif ((-d $_) && ($d_uid))
would always be false.

If you keep the wanted_c() subroutine as is, note also that it is doing a lot of unnecessary work repeatedly calling stat on the same file as each elsif test fails - that is probably most easily fixed by doing a single bare stat($_) at the beginning of the routine and using the filetests on _ (underscore) thereafter.

That's probably enough to be getting on with for now. :)

Hugo

Replies are listed 'Best First'.
Re: Re: Permissions Utility Script for sysadmins - seeking comments
by BuddhaNature (Beadle) on Apr 21, 2004 at 22:19 UTC
    First off, thanks for al the constructive criticism. I have embarked and a number of the changes you mentioned.

    I am getting a bit confused with one or two things and was hoping you could elucidate a bit more on what might be going on and how I might make it better. Specifically where you mention "use a hash" - I am a bit unsure on how to best impliment this in this context. Also, do you have any insight into why (as you mentioned) if there are both dir-mode and dir-owner that the latter will be silently ignored. Thanks in advance.

    -Shane

      Ok, here's a simplistic way to replace one of the string of elsifs with a hash:

      my %vars = ( 'dir-mode' => \$d_mode, 'file-mode' => \$f_mode, 'link-mode' => \$l_mode, 'dir-owner' => \$d_own, 'file-owner' => \$f_own, 'link-owner' => \$l_own, 'dir-group' => \$d_grp, 'file-group' => \$f_grp, 'link-group' => \$l_grp, ); for my $testy (keys %{ $config{$test}{$tester} }) { unless ($vars{$testy}) { warn "Unexpected test key '$testy'"; next; } ${ $vars{$testy} } = $config{$test}{$tester}{$testy}; }

      This is simply saying: we have a string of (els)ifs, and whichever string we match we're going to execute almost the same code, so let's abstract out the one bit that varies (the variable name in this case) and let the hash associate the string to match with the variable name to modify, leaving the rest of the code constant.

      However, we also have a series of similar variable names, which is also often a sign that a hash is needed. So let's look at the point we're using these variables: I'd characterise the wanted_c subroutine as doing something like:

      • where specified, update any of the mode, owner or group of the file
      • but select a different set of updates depending on whether the file to update is a directory, a link or a plain file
      so I'd be inclined to map that to a data structure that looks like:
      my $wanted_this_time = { mode => $wanted_mode, owner => $wanted_owner, group => $wanted_group, };
      but pick one of three such structures depending on the type of file:
      my $wanted = { if_directory => { mode => $dir_mode, owner => $dir_owner, group => $dir_group, }, if_link => { ... } if_file => { ... } };
      and having made that conceptual transformation, we can now get rid of the matrix of individually named variables:
      my %wanted; my %validfile = map +($_ => 1), qw/ dir file link /; my %validupdate = map +($_ => 1), qw/ mode owner group /; for (keys %{ $config{$test}{$tester} }) { my($filetype, $updatetype} = split /-/, $_, 2; unless ($validfile{$filetype} && $validupdate($updatetype}) { warn "Unexpected directive '$_'"; next; } $wanted{$filetype}{$updatetype} = $config{$test}{$tester}{$_}; } [...] sub wanted_c { stat($_); my $wanted = (-d _ ? $wanted{'dir'}) : (-l _ ? $wanted{'link'}) : (-f _ ? $wanted{'file'}) : return; # ?? warn unexpected type? return unless $wanted; # nothing to change if (defined $wanted->{'mode'}) { change_mode($_, $wanted->{'mode'}); } if (defined $wanted->{'owner'}) { change_owner($_, $wanted->{'owner'}); } if (defined $wanted->{'group'}) { change_group($_, $wanted->{'group'}); } }

      Note that you could as easily represent this data using arrays rather than hashes, but it then becomes much less self-documenting unless you also use named constants for the indexes into each array.

      In general, whenever I see code that needs to cope with several different cases, my instinct is to ask: what effort will be required when another new case needs to be added to the list? How far can I reduce that effort? How can I minimise the danger of getting it wrong when I need to add a new case, for example by adding it in one place and not in another?

      Usually, the answer is: use data structures to minimise code duplication; where possible, encapsulate all the information for the cases in a single data structure, so all the work involved in adding a new case is reduced to adding a new entry into that data structure.

      It may be overkill for this script, but we can take that extra step by taking advantage of code references. The data structures might then look something like this:

      my %filetypes = ( 'dir' => sub { -d _ }, 'link' => sub { -l _ }, 'file' => sub { -f _ }, ); my %updatetypes = ( 'mode' => \&change_mode, 'owner' => \&change_owner, 'group' => \&change_group, ); sub wanted_c { stat($_); for my $filetype (keys %wanted) { next unless &{ $filetypes{$filetype} }; my $wanted = $wanted{$filetype}; for my $update (keys %$wanted) { &{ $updatetypes{$update} }($_, $wanted->{$update}); } } }

      However, I suspect that would not be desirable, at least for the update types, since it makes it much harder to add interdependencies between the update types such as (for example) updating ownership before mode.

      Hugo

        So I took your advice and made the following changes:

        my %wanted; my %validfile = map +($_ => 1), qw/ dir file link /; my %validupdate = map +($_ => 1), qw/ mode owner group /; for (keys %{ $config{$test}{$tester} }) { my($filetype, $updatetype) = split /-/, $_, 2; unless ($validfile{$filetype} && $validupdate{$updatet +ype}) { warn "Unexpected directive '$_'"; next; } $wanted{$filetype}{$updatetype} = $config{$test}{$test +er}{$_}; } [...] sub wanted_c { print "wanted_c is working on $_\n"; my %wanted_prev = %wanted; my $wanted_here; stat($_); if (-d $_) {$wanted_here = $wanted_prev{'dir'}} elsif (-l $_) {$wanted_here = $wanted_prev{'link'}} elsif (-f $_) {$wanted_here = $wanted_prev{'file'}} else {return;} return unless $wanted_here; # nothing to change print Dumper($wanted_here); if (defined $wanted_here->{'mode'}) { change_mode($_, $wanted_here->{'mode'}) + } if (defined $wanted_here->{'owner'}) { change_owner($_, $wanted_here->{'owner'}); } if (defined $wanted_here->{'group'}) { change_group($_, $wanted_here->{'group'}); } }

        The changes work like a charm. I have one problem and one question though. The problem is that I keep getting a diagnostic warning:

        Variable "%wanted" will not stay shared at permmy7.pl line 317 (#1)

        I am a little unsure about how best to go about getting rid of it. Can't _really_ see an easy way to sub something out...

        The question I have is, previously I was having issues with the file/directory/link being checked by the wanted_c function hitting only one if and then returning, so I had to toss in some recurssion - why doesn't that problem occur with the above code? I am just not seeing it and it is bugging me. It works and I am thankful for that but I want to know _why_ it works... Thanks in advance.

        -Shane

      Also, do you have any insight into why (as you mentioned) if there are both dir-mode and dir-owner that the latter will be silently ignored.

      Looking again, this was an error on my part - I'd missed the surrounding loop. Apologies for the misdirection.

      Update: bother, I was right the first time. See Re^5: Permissions Utility Script for sysadmins - seeking comments instead.

      I'll try tomorrow to come up with a couple of examples of how you might use a hash to improve things.

      Hugo

        Ah, but you _were_ right - it does happen (or did since I fixed it). It was because the wanted_c function was only being called once for each file, and if it had more than one applicable directive it just passed up the other two. So I changed the wanted_c function calls to something like this:

        sub wanted_c { if ((-d $_) && ($d_mode) { change_mode($_, $d_mode); $d_mode = ""; wanted_c(); } elsif ((-d $_) && ($d_own)) { change_owner($_, $d_own); $d_own = ""; wanted_c(); }

        That way it gets called as many times as is applicable to the file. Thanks for the pointer - I look forward to looking into your hash idea, which is slightly coalescing in my brain, but wouldn't mind your nudging it a bit with some advice/examples.

        -Shane UPDATE: This fixed the problem but created another - it is doing everything, but only on one file! I see it is because of setting the variables to "", but am kinda unsure of what to do instead...

        2nd UPDATE: Things are good now except for one thing, in order to fix the problem I created

        my %info = %{ $config{$test}{$tester} };

        after the spot you suggest a hash. I then put

        else { while(my($key,$val) = each(%info)) { if ($key eq "dir-mode") { $d_mode = $val} elsif ($key eq "file-mode") { $f_mode = $val} elsif ($key eq "link-mode") { $l_mode = $val} elsif ($key eq "dir-owner") { $d_own = $val} elsif ($key eq "file-owner") { $f_own = $val} elsif ($key eq "link-owner") { $l_own = $val} elsif ($key eq "dir-group") { $d_grp = $val} elsif ($key eq "file-group") { $f_grp = $val} elsif ($key eq "link-group") { $l_grp = $val} } return 0; }

        as the final part of the wanted_c function. Everything works fine now, except since I am using diagnostics I get:

        Variable "%info" will not stay shared at permmy6.pl line 372 (#1) (W closure) An inner (nested) named subroutine is referencing a lexical variable defined in an outer subroutine. When the inner subroutine is called, it will probably see the value of the outer subroutine's variable as it was before and during the *first +* call to the outer subroutine; in this case, after the first call to th +e outer subroutine is complete, the inner and outer subroutines will no longer share a common value for the variable. In other words, the variable will no longer be shared. Furthermore, if the outer subroutine is anonymous and references a lexical variable outside itself, then the outer and inner subroutines will never share the given variable. This problem can usually be solved by making the inner subroutine anonymous, using the sub {} syntax. When inner anonymous subs that reference variables in outer subroutines are called or referenced, the +y are automatically rebound to the current values of such variables.

        Any thoughts?