jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I just got a result from a regular expression that I didn't really expected. For example
#!/usr/bin/perl -lw use strict ; my $d = "2005-12-" ; if ( $d =~ /\d{4}-\d{1,3}(?!-)/ ) { print "yep" ; }
I expected it not to match because of the '-' at the end.
But it matched.
I found 2 ways to fix the regular expression
\d{4}-\d{1,3}(?!-)$ \d{4}-\d{2}(?!-)
Any suggestions why my first example doesn't do what I expected ?

Thnx a lot
LuCa

Replies are listed 'Best First'.
Re: negative look-ahead is ignored
by dave_the_m (Monsignor) on Feb 19, 2007 at 10:14 UTC
    It matches becuase the \d{1,3} can match a single digit, then the second digit satisfies the lookahead.

    If you run the code with  -Mre=debug you can see it first matching two digits, then matching the lookahead string (thus failing), backtracking, trying to match a single digit, then succeeding:

    Setting an EVAL scope, savestack=5 0 <> <2005-12-> | 1: CURLY {4,4} DIGIT can match 4 times out of 4... Setting an EVAL scope, savestack=5 4 <2005> <-12-> | 4: EXACT <-> 5 <2005-> <12-> | 6: CURLY {1,3} DIGIT can match 2 times out of 3... Setting an EVAL scope, savestack=5 7 <2005-12> <-> | 9: UNLESSM[-0] 7 <2005-12> <-> | 11: EXACT <-> 8 <2005-12-> <> | 13: SUCCEED could match... failed... 6 <2005-1> <2-> | 9: UNLESSM[-0] 6 <2005-1> <2-> | 11: EXACT <-> failed... 6 <2005-1> <2-> | 15: END Match successful!

    Dave.

Re: negative look-ahead is ignored
by johngg (Canon) on Feb 19, 2007 at 10:09 UTC
    I think it is because your \d{1,3}successfully matches one digit (the '1') which is not followed by a dash.

    Cheers,

    JohnGG

Re: negative look-ahead is ignored
by ferreira (Chaplain) on Feb 19, 2007 at 11:34 UTC

    Look-aheads and look-behinds are IMO advanced constructions we don't need most of the time for simple problems. They demand more thought and many times are not what we wanted at the end. For example, from your description ("I expected it not to match because of the '-' at the end.") and your two solutions, I would suggest the use of simple /\d{4}-\d{1-3}([^-\d]|$)/, where [^-\d] prevents the pattern to match after 1, 2 or 3 digits and encountering yet another digit or dash, and $ in the alternation makes it succeed at the end of the line.

    #!/usr/bin/perl -w use strict ; use warnings; for my $d qw(2005-12 2005-100 2005-1- 2005-12- 2005-123- 2005-1000) { printf "%-10s: ", $d; if ( $d =~ /\d{4}-\d{1,3}([^-\d]|$)/ ) { print "yep\n" ; } else { print "nope\n"; } }
    outputs
    2005-12 : yep 2005-100 : yep 2005-1- : nope 2005-12- : nope 2005-123- : nope 2005-1000 : nope

      It's actually possible to avoid matching the extra character by using (?>...).

      for my $d qw(2005-12 2005-100 2005-1- 2005-12- 2005-123- 2005-1000) { printf "%-10s: ", $d; if ( $d =~ /(?>\d{4}-\d{1,3})(?!-)/ ) { print "yep\n" ; } else { print "nope\n"; } }
      2005-12 : yep 2005-100 : yep 2005-1- : nope 2005-12- : nope 2005-123- : nope 2005-1000 : yep

      Use (?![-\d]) if you don't want the last to match.

Re: negative look-ahead is ignored
by Moron (Curate) on Feb 19, 2007 at 14:28 UTC
    Although the exact answer to the question has been given, I feel there is a deeper answer to this. I would advise searching for what you DO expect rather than picking on what might be insufficient examples of what you don't. In this case, that isn't clear in the OP. But if the data is supposed to terminate at this point, it is better to match on the terminator, e.g.
    /^\d{4}\-\d{2}$/ or die; # match on end of string after \d{2}, or ... /^\d{4}\-\d{2}\s+/ or die; # match on whitespace delimiter # etc.

    -M

    Free your mind

      I always try to keep my questions as short as possible!
      So they often do not describing the true issue
      Anyway, I get your point, but in my case, in which I've written a 'generic date parser/converter' I don't really want to use ^ and $, it would limit the number of possible date formats, for example
      "2005-031" " 2005-31" "2005031 " "|2005-031 12:11:22| " "Some time ago 1776-07-04 ....."
      And I'm not even started to scratch the surface of what my 'generic date parser/converter' can do more :)

      Thnx
      LuCa
        In that case I would be inclined to maintain a list of regexps - one for each allowable format - rather than (I predict) torturing one into handling successive new requirements until it finally dies in an agony of unmaintainability. I might even put them in a configuration file rather than code for easy update in production environments, load and chop them them into an array and then try them out successively on the data until a match is found or the possible formats exhausted.

        -M

        Free your mind

        You probably already know about Date::Manip. It has a function ParseDateString that should do some of what you want.
Re: negative look-ahead is ignored
by bart (Canon) on Feb 20, 2007 at 07:11 UTC
    You can use the "cut" operator in order to prevent backtracking. That way you can stay closer to your original code.
    /\d{4}-(?>\d{1,3})(?!-)/
    But, you still might want to include that digit in the negative lookahead, or you still can get unexpected matches.
    for (qw(2005-12 2005-100 2005-1- 2005-12- 2005-123- 2005-1000)) { printf "%-10s: ", $_; if ( /\d{4}-(?>\d{1,3})(?!-)/ ) { print "yep\n" ; } else { print "nope\n"; } }
    Result:
    2005-12   : yep
    2005-100  : yep
    2005-1-   : nope
    2005-12-  : nope
    2005-123- : nope
    2005-1000 : yep
    
    So, better make it
    /\d{4}-(?>\d{1,3})(?![\-\d])/
    In this case, the cut operator becomes close to useless. Well, it doesn't hurt.

    Update I shouldn't post in a hurry. I now see ikegami has posted a post very similar to mine. Duh.

Re: negative look-ahead is ignored
by jeanluca (Deacon) on Feb 19, 2007 at 10:46 UTC
    Thanks, that explains why.
    So I guess there is no way to force \d{1,3} to match first 3 digits, then 2 etc ?
    I also tried the following regular expression (using "2005-122-")
    \d{4}-(\d{3}|\d{2}|\d)(?!-)
    but I noticed (using -Mre=debug) it doesn't change anything (allthough I really thought this would do the trick :)

    LuCa

      You might want to consider a negative lookahead on \d as well:

      \d{4}-\d{1,3}(?![-\d])

      Of course, this means you won't match a string like "2005-1222" either. The original would match that, and if that's intentional, you need an even trickier version:

      \d{4}-\d{1,3}(?!\d*-)

      Here's betting one of these should be what you want. :-)

      print "Just another Perl ${\(trickster and hacker)},"
      The Sidhekin proves Sidhe did it!

Re: negative look-ahead is ignored
by jeanluca (Deacon) on Feb 19, 2007 at 12:21 UTC
    thats all I needed to know!!

    Thanks!!!!

    LuCa