athanasia has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow monks,

I need to write a bit of code that checks if a string matches a set of dialplan rules, the following:

o The string should not be empty
o The string could start with a single underscore (_)
o It may contain digits, # or *
o Its numeric part (there could be another part, the one described by the rule below) may end with a range of digits in the format [x-y], where x can be one or more digits and y should be a single digit,
o The string could end with a "character part" containing only the following characters : . (only once), ! (only once) or X (maybe more than one)

** With the term numeric part, which is not so successful, I admit, I refer to the substring that contains #,*,digits or the [x-y] pattern. Hashes (#) and asterisks (*) may mix inside the numeric part but cannot be a part of [x-y]. Some examples of allowed strings:
100
_199
_2XXXXX
800!
_*34#
_*3*#2
_##34[ 12-5].

After checking the validity of the string, I should also fill in two variables, say $res and $maxdigs. The first variable, $res, should contain the "numeric part" i.e. in my previous examples "100", "199", "2", "800", "*34#", "*3*#2", "##34[ 12-5]" and the second variable, $maxdigs should contain "", "", "XXXXX", "!", "", "", "." respectively.

I have written the following bit of code ($pattern is my input string):
my ($res, $maxdigs, $dummy, $silly); ($res, $dummy, $silly, $maxdigs) = $pattern=~/^_*((\d*|#*|\**)+(\[\d+\ +-\d\])?)(X*\.?\!?)$/g;
This bit works in general, however I always fear I have missed something when it comes to perl regular expressions... So, if anyone has the time/appetite to question my code, I would be really really indebted ;-).

Athanasia

Update: Thanks to all who responded so quickly. Imagine I did not even know of the /x operator which obviously could make my life so much simpler! Obviously, my rules were not very strictly explained, thus, I have updated the original query. In any case, I already found some problems in my code thanks to your suggestions.

Replies are listed 'Best First'.
Re: Matching dialplan rules...
by oeuftete (Monk) on Dec 04, 2008 at 14:43 UTC

    Definitely when writing a complex one like this, use the /x modifier to make things clearer to yourself. Taking your expression as is:

    $pattern =~ / ^ _* # 0 or more underscores to start ( (\d*|#*|\**)+ # Any combo of digits, #s, or *s (\[\d+\-\d\])? # Maybe a [x-y] ) (X*\.?\!?) # Zero or more Xs, and maybe a . or ! $ /gx;

    The other thing I'd recommend doing is writing a test script to make sure your regex works.

    use strict; use warnings; use Test::More qw(no_plan); my $regex = qr{ ... stuff ... }x; # Tests like( '100', $regex, 'Valid numeric' ); like( '_100', $regex, 'Valid numeric with leading underscore' ); # etc. unlike( '', $regex, 'Invalid empty string' ); # etc.
      #!/usr/bin/perl -- use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/^_*((\d*|#*|\**)+(\[\d+\-\d\])?)(X +*\.?\!?)$/ )->explain; __END__ The regular expression: (?-imsx:^_*((\d*|#*|\**)+(\[\d+\-\d\])?)(X*\.?!?)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- _* '_' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- ( group and capture to \2 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- \d* digits (0-9) (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- #* '#' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \** '*' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )+ end of \2 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \2) ---------------------------------------------------------------------- ( group and capture to \3 (optional (matching the most amount possible)): ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \- '-' ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- )? end of \3 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \3) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \4: ---------------------------------------------------------------------- X* 'X' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \.? '.' (optional (matching the most amount possible)) ---------------------------------------------------------------------- !? '!' (optional (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \4 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Matching dialplan rules...
by JavaFan (Canon) on Dec 04, 2008 at 14:50 UTC
    It's not clear by the rules, but your examples suggest:
    • The numeric part comes before the "character part".
    • The numeric part isn't empty.
    • The optional '#' and '*' may be before the numeric part, or after the numeric part, but it cannot mix.
    Taking the above into account, I'd use:
    qr { ^ # Anchor at the start _? # Optional underscore [\#*]* # Optional sharps and stars # Numerical part: [0-9]+ # Digits (?: # Optional \[\s*[0-9]+-[0-9]\] # '[', whitespace, digits, '-', dig +it, ']' )? # End optional [\#*]* # Optional sharps and stars # Character part: X* # Zero or more 'X' (?: # Optional # Either !X* # '!' followed by 'X's, (?:\.X*)? # optionally '.' followed by 'X' +s | # or \.X* # '.' followed by 'X's, (?:!X*)? # optionally '!' followed by 'X' +s )? # End optional $ # Anchor at the end. }x;
    I only briefly tested it, but it does match your examples.