coltman has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, Can someone tell me why the following regex does not work? What I want is just the date information "1st day of November 2005". Thanks a bunch!
my $aa = "Accepted and agreed to by Jerry Smith as of the 1st day of N +ovember 2005."; if ( $aa =~ m{ (?:as of)* (?:\s|\n)* #for any possible space or line end (?:this|the)* (?:\s|\n)* #for any possible space or line end ( (?:[1-31]) #day, 1 to 31 (?:st|nd|rd|th) #for all the cases like 1st, 2nd, 3rd, 25th (?:\s|\n)* #for any possible space or line end (?:day of)* (?:\s|\n)* #for any possible space or line end (?:January|February|March|April|May|June|July| August|September|October|November|December) (?:\s|\n)* #for any possible space or line end (?:\,|\s)* #for any possible space or comma end (?:\s|\n)* #for any possible space or line end ([1,2]\d{2,3}) #year ) (?:\,|\.|\s)* (?:[ ]{2}|\n|\z)* #Two spaces, newline, or string end }ixmsg) { print $1; }
It works now. I am really grateful to davorg, Fletch, johngg, aggianni,anno, and shmem for all your helpful comments and suggestions. I just get to know PERL for about two weeks and it is great that I find this forum and so many nice and helpful experts here.

Replies are listed 'Best First'.
Re: Regex question
by davorg (Chancellor) on Mar 22, 2007 at 14:36 UTC

      (?:[1-9]|1[0-9]|2[0-9]|3[01]) is a less simple replacement that would accept fewer invalid dates (e.g. you'd still match against 31st day of February; then again you'd want to do that validation further up the pipe anyhow).

        Another, lazier, way is to use an array and localize the default list separator

        @days = ( 1 .. 31 ); local $" = q{|}; m{(?:@days)};

        You'd probably want to limit the local $" inside a code block to avoid unexpected side effects elsewhere.

        Cheers,

        JohnGG

Re: Regex question
by agianni (Hermit) on Mar 22, 2007 at 14:44 UTC
    Be careful when you use the x modifier on the match. It makes your pattern easier to read, but you need to explicitly include all of your white space characters. So (?:as of)* needs to be (?:as\sof)*, for example.
    perl -e 'split//,q{john hurl, pest caretaker}and(map{print @_[$_]}(joi +n(q{},map{sprintf(qq{%010u},$_)}(2**2*307*4993,5*101*641*5261,7*59*79 +*36997,13*17*71*45131,3**2*67*89*167*181))=~/\d{2}/g));'
Re: Regex question
by Anno (Deacon) on Mar 22, 2007 at 14:58 UTC
    Apart from the comments that you have already got, you seem to use the "*" quantifier rather indiscriminately. Are you sure you know what it does in a regex?

    What you specify as "(?:\s|\n)" is equivalent to a single "\s". "\n" is already included in the set of characters matched by "\s".

    Anno

    Update: Minor corrections

Re: Regex question
by shmem (Chancellor) on Mar 22, 2007 at 15:12 UTC
    What I want is just the date information "1st day of November 2005"
    No, you don't, you also want the date string to be prepended by "as of" etc but then you don't care. (?:shrug)*.

    If you just wanted the date string, why not just:

    $_ = "Accepted and agreed to by Jerry Smith as of the 1st day of November 2005."; print $1 if /(\d+(?:st|nd|rd|th)\s+day\s+of\s+[ADFJMNOSabceghilmnoprst +uvy]+\s+\d+)/ms; __END__ 1st day of November 2005

    which would also match "35th day of ANNO 0007", but the matched date strings need to be validated anyways...

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}