Re: negative look-ahead is ignored
by dave_the_m (Monsignor) on Feb 19, 2007 at 10:14 UTC
|
It matches becuase the \d{1,3} can match a single digit, then the second digit satisfies the lookahead.
If you run the code with -Mre=debug you can see it first matching two digits, then matching the lookahead string (thus failing), backtracking, trying to match a single digit, then succeeding:
Setting an EVAL scope, savestack=5
0 <> <2005-12-> | 1: CURLY {4,4}
DIGIT can match 4 times out of 4...
Setting an EVAL scope, savestack=5
4 <2005> <-12-> | 4: EXACT <->
5 <2005-> <12-> | 6: CURLY {1,3}
DIGIT can match 2 times out of 3...
Setting an EVAL scope, savestack=5
7 <2005-12> <-> | 9: UNLESSM[-0]
7 <2005-12> <-> | 11: EXACT <->
8 <2005-12-> <> | 13: SUCCEED
could match...
failed...
6 <2005-1> <2-> | 9: UNLESSM[-0]
6 <2005-1> <2-> | 11: EXACT <->
failed...
6 <2005-1> <2-> | 15: END
Match successful!
Dave. | [reply] [d/l] [select] |
Re: negative look-ahead is ignored
by johngg (Canon) on Feb 19, 2007 at 10:09 UTC
|
I think it is because your \d{1,3}successfully matches one digit (the '1') which is not followed by a dash.Cheers, JohnGG | [reply] [d/l] |
Re: negative look-ahead is ignored
by ferreira (Chaplain) on Feb 19, 2007 at 11:34 UTC
|
Look-aheads and look-behinds are IMO advanced constructions we don't need most of the time for simple problems. They demand more thought and many times are not what we wanted at the end.
For example, from your description ("I expected it not to match because of the '-' at the end.") and your two solutions, I would suggest the use of simple /\d{4}-\d{1-3}([^-\d]|$)/, where [^-\d] prevents the pattern to match after 1, 2 or 3 digits and encountering yet another digit or dash, and $ in the alternation makes it succeed at the end of the line.
#!/usr/bin/perl -w
use strict ;
use warnings;
for my $d qw(2005-12 2005-100 2005-1- 2005-12- 2005-123- 2005-1000) {
printf "%-10s: ", $d;
if ( $d =~ /\d{4}-\d{1,3}([^-\d]|$)/ ) {
print "yep\n" ;
} else {
print "nope\n";
}
}
outputs
2005-12 : yep
2005-100 : yep
2005-1- : nope
2005-12- : nope
2005-123- : nope
2005-1000 : nope
| [reply] [d/l] [select] |
|
|
for my $d qw(2005-12 2005-100 2005-1- 2005-12- 2005-123- 2005-1000) {
printf "%-10s: ", $d;
if ( $d =~ /(?>\d{4}-\d{1,3})(?!-)/ ) {
print "yep\n" ;
} else {
print "nope\n";
}
}
2005-12 : yep
2005-100 : yep
2005-1- : nope
2005-12- : nope
2005-123- : nope
2005-1000 : yep
Use (?![-\d]) if you don't want the last to match.
| [reply] [d/l] [select] |
Re: negative look-ahead is ignored
by Moron (Curate) on Feb 19, 2007 at 14:28 UTC
|
Although the exact answer to the question has been given, I feel there is a deeper answer to this. I would advise searching for what you DO expect rather than picking on what might be insufficient examples of what you don't. In this case, that isn't clear in the OP. But if the data is supposed to terminate at this point, it is better to match on the terminator, e.g.
/^\d{4}\-\d{2}$/ or die; # match on end of string after \d{2}, or ...
/^\d{4}\-\d{2}\s+/ or die; # match on whitespace delimiter
# etc.
| [reply] [d/l] |
|
|
I always try to keep my questions as short as possible!
So they often do not describing the true issue
Anyway, I get your point, but in my case, in which I've written a 'generic date parser/converter' I don't really want to use ^ and $, it would limit the number of possible date formats, for example
"2005-031"
" 2005-31"
"2005031 "
"|2005-031 12:11:22| "
"Some time ago 1776-07-04 ....."
And I'm not even started to scratch the surface of what my 'generic date parser/converter' can do more :)
Thnx
LuCa | [reply] [d/l] |
|
|
In that case I would be inclined to maintain a list of regexps - one for each allowable format - rather than (I predict) torturing one into handling successive new requirements until it finally dies in an agony of unmaintainability. I might even put them in a configuration file rather than code for easy update in production environments, load and chop them them into an array and then try them out successively on the data until a match is found or the possible formats exhausted.
| [reply] |
|
|
You probably already know about Date::Manip. It has a function ParseDateString that should do some of what you want.
| [reply] |
|
|
Re: negative look-ahead is ignored
by bart (Canon) on Feb 20, 2007 at 07:11 UTC
|
You can use the "cut" operator in order to prevent backtracking. That way you can stay closer to your original code.
/\d{4}-(?>\d{1,3})(?!-)/
But, you still might want to include that digit in the negative lookahead, or you still can get unexpected matches.
for (qw(2005-12 2005-100 2005-1- 2005-12- 2005-123- 2005-1000)) {
printf "%-10s: ", $_;
if ( /\d{4}-(?>\d{1,3})(?!-)/ ) {
print "yep\n" ;
} else {
print "nope\n";
}
}
Result:
2005-12 : yep
2005-100 : yep
2005-1- : nope
2005-12- : nope
2005-123- : nope
2005-1000 : yep
So, better make it
/\d{4}-(?>\d{1,3})(?![\-\d])/
In this case, the cut operator becomes close to useless. Well, it doesn't hurt.
Update I shouldn't post in a hurry. I now see ikegami has posted a post very similar to mine. Duh.
| [reply] [d/l] [select] |
Re: negative look-ahead is ignored
by jeanluca (Deacon) on Feb 19, 2007 at 10:46 UTC
|
Thanks, that explains why.
So I guess there is no way to force \d{1,3} to match first 3 digits, then 2 etc ?
I also tried the following regular expression (using "2005-122-") \d{4}-(\d{3}|\d{2}|\d)(?!-)
but I noticed (using -Mre=debug) it doesn't change anything (allthough I really thought this would do the trick :)
LuCa
| [reply] [d/l] |
|
|
\d{4}-\d{1,3}(?![-\d])
Of course, this means you won't match a string like "2005-1222" either. The original would match that, and if that's intentional, you need an even trickier version:
\d{4}-\d{1,3}(?!\d*-)
Here's betting one of these should be what you want. :-)
print "Just another Perl ${\(trickster and hacker)},"
The Sidhekin proves Sidhe did it!
| [reply] [d/l] [select] |
Re: negative look-ahead is ignored
by jeanluca (Deacon) on Feb 19, 2007 at 12:21 UTC
|
thats all I needed to know!!
Thanks!!!!
LuCa | [reply] |