Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm struggling with what should be a simple regex and would like some guidance please.

I'm prompting a user to enter a database name and attempting a match on the pattern entered. If it matches a certain set of forbidden names then the user is asked to retry. The database names which are valid consist of alphanumeric and an underscore is also allowed. The system databases are not allowed. Here is what I've got so far .. which isn't working as I expected.

'[^(master|model|dbccdb|sybsecurity|sybsystemdb|sybsystemprocs|tempdb| +DBA)][a-zA-Z0-9_]+'
So if they enter any of these names they should be rejected
master model dbccdb sybsecurity sybsystemdb sybsystemprocs tempdb DBA
However the above code is not doing an exact word match like I expected since it's still rejecting valid names e.g. ed123

Any help would be welcome

Replies are listed 'Best First'.
Re: exact word match
by samarzone (Pilgrim) on Dec 22, 2010 at 12:23 UTC

    Square brackets are for matching individual characters. The regex you wrote will reject (because of ^) any word that has any of 'm', 'a', 's', 't', 'e', 'r', '|' .. and so on.

    If you meant a valid database name except the names given in your regex you could have written

    if($name =~ /^[a-z0-9_]$/i && $name !~ /^(?:master|model|dbccdb|sybsec +urity|sybsystemdb|sybsystemprocs|tempdb|DBA)$/) { print "allowed\n" } else { print "not allowed\n" }

    Update: This can also be written in a single regex using negative look-ahead assertion as follows

    if($name =~ /^(?!(master|model|dbccdb|sybsecurity|sybsystemdb|sybsyste +mprocs|tempdb|DBA)$)(^[a-z0-9_]+$)/i) { print "allowed\n" } else { print "not allowed\n" }

    -- Regards - Samar
Re: exact word match
by johngg (Canon) on Dec 22, 2010 at 12:23 UTC

    It looks like you are trying to use a negated character class ([^ ... ]) to enclose your alternation of words you don't want. That will not work as character classes just deal with single characters. Perhaps something like this would be better.

    my $forbidden = qr{^(?:master|model|dbccdb|sybsecurity|sybsystemdb|syb +systemprocs|tempdb|DBA)[a-zA-Z0-9_]+}; if ( $entered =~ $forbidden ) { # Your rejection code here }

    I hope this is helpful.

    Cheers,

    JohnGG

      Thanks for this and the other answers. I can see where I was going wrong now. This has been very helpful.
Re: exact word match
by cdarke (Prior) on Dec 22, 2010 at 12:12 UTC
    The [...] notation is for building character classes, that is a range of single characters. I assume that you mean the ^ to indicate 'not', but that cannot be applied to groups of words as you have tried.

    I'm not certain what the pattern is that you wish to match, can you describe it in English first?

Re: exact word match
by ww (Archbishop) on Dec 22, 2010 at 13:57 UTC
    Naming (if it's been done or if you can do it) the user accessible dbs in a characteristic format -- for example _user_specificname -- or the system dbs in a format which will NOT match any name assigned to a user file -- say, _sys_topsecret -- will at a minimum simplify the required regex, reduce the possibility of having a db with a name that's allowed when it's supposed to be system-use only, and -- some say -- offer modestly better security.

    The conventional wisdom seems to be that regexen which allow only acceptable characters are more reliable than those which attempt to anticipate all the possible "bad" chars and prohibit them.

    This "wisdom" has holes, and is probably not particularly helpful in the case where you have a limited universe of user-accessible db names, but may be worth knowing as you write regexen for other purposes.

Re: exact word match
by Anonymous Monk on Dec 22, 2010 at 12:27 UTC
Re: exact word match
by Marshall (Canon) on Dec 23, 2010 at 07:52 UTC
    As another idea, I don't see the need to do this all in one regex. You have an explicit list of forbidden names, I would make an array(or hash) of those names and put that in a prominent place - not embedded in a tricky regex later in the code. Below I used a hash look-up for an exact match, but the forbidden table could also a list of patterns if it was necessary to get that complicated.

    There is nothing wrong with having a number of little regexes as you continue to valid the user's input. With code written like this, reading the user error messages gives a clear picture of what is and what is not allowed without having to use brain cells thinking about the regex albiet as simple as they are below.

    #!/usr/bin/perl -w use strict; my @forbidden = qw ( master model dbccdb sybsecurity sybsystemdb sybsystemprocs tempdb DBA ); my %forbidden = map{lc $_ => 1}@forbidden; # or could just build the hash manually while ( (print "Enter DB name: "), (my $name = <STDIN>) !~ /^\s*q(uit)?\s*$/i ) { $name =~ s/^\s+//; $name =~ s/\s+$//; #also does chomp next if ($name =~ /^\s*$/); #simple re-prompt if blank line if ($name =~ / /) { print "Error no spaces within name allowed!\n"; next; } if ( $forbidden{lc $name} ) { print "Error: $name is a reserved name!\n"; next; } if ( $name =~ /\W/) # only a-zA-Z0-9_ allowed { print "Error: illegal character in name!\n"; next; } #... perhaps more tests on the input name? print "Ok, $name is valid\n"; }