BUU has asked for the wisdom of the Perl Monks concerning the following question:

First off, the disclaimer. This isn't my problem and I'm well aware it would be vastly simpler to just use the <= and >= operators. Thats not any fun however, so I'm trying to solve it with a regex.

The problem: Match a range between 1950-2050, inclusive. My attempt got me here:/[12][90](?:(?<=9(?=[5-9]))|(?<=0(?=[0-5])))[0-9]/ but as you can easily see, it fails on 2054 and so forth. So how do I get that last digit?

Replies are listed 'Best First'.
Re: Matching date range with pure regex
by Abigail-II (Bishop) on Feb 17, 2004 at 11:21 UTC
    local $" = "|"; /^(?:@{[1950 .. 2050]})$/;

    Abigail

      By way of explanation, this builds a long regexp through basic string interpolation like the following:
      /^(?:1950|1951|1952|1953|...|2049|2050)$/;
      After generating a regex like that, if you will be using it often, you might want to optimize it for common prefixes or suffixes.
      use Regex::PreSuf; my $re = presuf(1950..2050);

      --
      [ e d @ h a l l e y . c c ]

        If the OP wanted something optimized, he wouldn't have used a regex to begin with.

        Abigail

Re: Matching date range with pure regex
by grinder (Bishop) on Feb 17, 2004 at 11:08 UTC

    I might be missing part of your question, but this appears (applying the KISS principle) to do what you ask:

    #! /usr/bin/perl -wl use strict; while( <DATA> ) { chomp; print "$_ ", /^(19[5-9]\d|20([0-4]\d|50))$/ ? 'ok' : 'nok'; } __DATA__ 1949 1950 1951 1999 2000 2001 2010 2049 2050 2051 2102 22102 19534 19080 20010
Re: Matching date range with pure regex
by Roger (Parson) on Feb 17, 2004 at 11:10 UTC
    use strict; while (<DATA>) { chomp; print "$_ is ", /^(?:19|20) (?: (?:(?<=19)[5-9]|(?<=20)[0-4])[0-9] | 50)$ /x ? "Ok" : "not ok", "\n" } __DATA__ 1050 1950 2050 2054 1980 2004 3100

    Updated: Added the trailing '$' in the regex to limit the length to 4 characters. Thanks MCS for pointing that out. :-)

Re: Matching date range with pure regex
by posix_guy (Novice) on Feb 17, 2004 at 11:21 UTC
    Try  /^(19[5-9]\d)|(20([0-4]\d)|50)$/
      You need to add an outermost set of parentheses. /^((19[5-9]\d)|(20(([0-4]\d)|50)))$/

      I don't know why, but I'm always paranoid when using | in a regex, primarily because I don't know exactly how it works. (I need to read Mastering Regular Expressions, I know ...)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Matching date range with pure regex
by MCS (Monk) on Feb 17, 2004 at 15:04 UTC

    Kudos to the voters... awarding Abigail (with the slowest regex) the most number of votes. (at least as I write this) There is a difference between wanting to do it with a regex and wanting to do it as slowly as possible. I took the above regex's from grinder, roger, posix_guy, and Abigail and ran them through a benchmark doing 10,000 iterations each.

    First, Roger: might want to add a $ on the end there because yours matches 19534 and 20010 etc... and posix_guy: yours matched 19534 for some reason. Anyway, I took grinder's data set and put it in an array (as with the multiple subroutines all accessing __DATA__ it wasn't properly displaying things). Then I seperated each regex into a different subroutine with the name of the author. I've attached the code if anyone wants to run it themselves. All results are from my Powerbook G4 667Mhz.

    Benchmark: timing 10000 iterations of abigail, grinder, posixguy, roger...
    grinder: 1 wallclock secs ( 1.06 usr + 0.03 sys = 1.09 CPU) @ 9174.31/s (n=10000)
    roger: 2 wallclock secs ( 1.02 usr + 0.07 sys = 1.09 CPU) @ 9174.31/s (n=10000)
    posixguy: 1 wallclock secs ( 1.37 usr + 0.03 sys = 1.40 CPU) @ 7142.86/s (n=10000)
    abigail: 57 wallclock secs (51.45 usr + 0.38 sys = 51.83 CPU) @ 192.94/s (n=10000)

      Just for kicks, I removed the print statement (as suggested by grinder) and threw in a ++$foo instead. Got the following: (Abigail's is still last but has the most ammount of votes still...)

      Benchmark: timing 10000 iterations of abigail, grinder, posixguy, roge +r... abigail: 55 wallclock secs (51.42 usr + 0.12 sys = 51.54 CPU) @ 19 +4.02/s (n=10000) grinder: 1 wallclock secs ( 0.60 usr + 0.00 sys = 0.60 CPU) @ 16 +666.67/s (n=10000) posixguy: 1 wallclock secs ( 0.96 usr + 0.00 sys = 0.96 CPU) @ 10 +416.67/s (n=10000) roger: 1 wallclock secs ( 0.61 usr + 0.00 sys = 0.61 CPU) @ 16 +393.44/s (n=10000)

      Updated code is in a readmore block (I removed the prints and added a $)

Re: Matching date range with pure regex
by podian (Scribe) on Feb 17, 2004 at 15:52 UTC
    I like the following one. If that does not work please let me know. I am not an expert in regex but want to become one!

    while (<DATA>) { chomp; if (/^(19|20)([5-9]|[0-5])([0-9])$/) { print "it matches for $_\n"; } } __DATA__ 1950 2050 2001 2009 2000

    It says: first two digits should be 19 or 20, third digit can go from 5 to 9 or 0 to 5 and the fourth digit is 0-9.

      (5-9|0-5)
      That doesn't really make sense... it's the same as 0-9. Your regex can be simplified to: /^(19|20)\d\d$/ Which doesn't meet the requirements.

      If you really want to become a regex master, the first step to enlightenment is to read "Mastering Regular Expressions" by Jeffrey Friedl. The link is to the publishers site (O'Reilly) but you can get it just about anywhere. That book taught me everything I know about regular expressions.

Re: Matching date range with pure regex
by Anonymous Monk on Feb 18, 2004 at 11:46 UTC
    okay, I'm cheating:
    #!/usr/bin/perl use strict; use warnings; while(<DATA>) { chomp(my $is_valid = $_); print $is_valid, "\n" if grep {/^$is_valid$/} (1950..2050); } __DATA__ 1949 1950 1951 1999 2000 2001 2010 2049 2050 2051 2102 22102 19534 19080 20010
    best regards, Ronnie