Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to validate some input from a datafile, just to clear out some obvious typos. In this case, it's to keep the Windows teammembers from trying to do something silly.

The task: to validate unix permission settings. However, using octal is just crazy - trying to remember the setuid bit is something better suited to a computer than a human. So I've adopted (mostly) the string that the chmod unix command uses.

According to the manpage, the symbolic mode is "[ugoa...][[+-=][rwxXs-tugo...]...][,...]" which is quite confusing. The idea is that there is a "USER", and an "OPERATION". The "USER" is one or more of "ugoa". The "OPERATION" is one of "+-=" (add, remove, or assign) and one or more of "rwxXs-tugo". Finally, you can do more than one operation (on different users) by separating them with commas.

In my particular case, I only need a small subset of this (I only allow assignment '=', not adding '+' or removing '-', and I want /[rwxsS]/ rather than /[rwxXs]/ to show up the same way as the ls command shows the letters), but that's not too important.

The question is ... what is the most efficient regexp available for this type of task? The idea is that I'm trying to take a single regexp, and have it match multiple times with an optional separator.

Checking Regexp::Common::list seems to be that if there is only one item (no separator), it fails to match, which is not what I'm looking for:

$ perl5.8.6 -MRegexp::Common=list -e 'print map {$_.$/} grep { /$RE{li +st}{-pat=>"[ugoa]+=[rwxsS]+"}{-sep=>","}/ } @ARGV' ugo=rw,a=r ugo=rw ugo=rw,a=r
Desired output is that both arguments are output as matches. Update: Yes, I know this is documented behaviour. I put this in here to show I've looked at some ways to do this, and this is the one that is the closest to what I'm looking for that I've found so far.

What I'm doing is this:

my $perm_re = qr/[ugoa]+=[rwxsS]+/; $val =~ /^(?:$perm_re(?:,$perm_re)*)$/;
Note that a blank string is also valid for my purposes. Is there a better way?

Update: A bit of clarification. First off, what I'm doing is very similar to the chmod input. But not precisely the same. It's close enough that anyone entering this data will intuitively know what letters mean what, or anyone reading the data manually will understand it, assuming any unix experience. But there is zero desire to be able to modify permissions, only to set them. The other dissimilarity is that I'm using "S" the way that the "ls" output uses it because I'm only assuming a certain level of unix experience, and s-bits are confusing enough when chmod and ls disagree about them ;-)

Or, if you want to ignore all of the above, I want to match a comma-separated string, with zero or more items, and each item matches a particular regexp, which I think I have about as good as I'm going to get.

Update 2: I'm not necessarily looking for speed, but idiomaticness. Duplication of $perm_re seems a bit wrong to me, which is why I extracted it to a separate regular epression - because if I want to add, remove, or fix something, it'll likely need to be added, removed, or fixed for both before and after the comma.

Replies are listed 'Best First'.
Re: Validation of unix permissions
by davidrw (Prior) on May 07, 2005 at 02:14 UTC
    For reference, check out SymbolicMode.pm which is what the chmod module uses. I didn't read too deep into it, but it appears to be very well commented.

    As for Regexp::Common::list failing with only item, that's documented behavior:
        $RE{list}{-pat}{-sep}{-lastsep}
        Returns a pattern matching a list of (at least two) substrings.

    Update: The File::chmod module may be a good reference as well.
    Also, if it helps to just convert to real from symbolic modes and work from there, Symbolic.pm (from the ppt bundle) can used directly like (or this could be used to see if it's parsable/valid format):
    use SymbolicMode; $realmode = SymbolicMode::mod ($mode, $file) or die "invalid mode: $ +mode\n";

    Update: "I want to match a comma-separated string, with zero or more items, and each item matches a particular regexp" -- Sorry, i got a little carried away from the root question.. Yeah, your regex looks like a pretty good way to do it. Only other suggestion would be to use the mod() example above to validate the symbolicmode (letting it do the heavy work), and then apply another regex to make sure it doesn't have stuff that you don't want so it fits your limited use.
Re: Validation of unix permissions
by eibwen (Friar) on May 08, 2005 at 05:35 UTC

    I want to match a comma-separated string, with zero or more items, and each item matches a particular regexp,

    I presume you're trying to parse valid chmod input such as ug=rwx,o=rx. While your statement allows for a number of alternative permutations, this seems the most likely.

    With regard to Update 2 in the OP and given the above understanding of valid input, I believe you may be looking for something akin to:

    foreach (split /,/, $perm) { warn "Invalid permission setting: $_" unless /^[ugoa]+=[rwxsS]+$/; }

    However, while this is significantly more informative, it doesn't adhere to the spirit of the OP prescription:

    The idea is that I'm trying to take a single regexp, and have it match multiple times with an optional separator.

    If you just want to detect whether the string contains an invalid permission and you don't need to inform the user of the errant setting, it is possible to use a single regex:

    foreach (@perms) { warn "Invalid permission setting detected in $_" unless /^(([u +goa]+=[rwxsS]+)(?(?!$),))+$/; }