in reply to Data Validation Tests

I would like to have all of those tests in Regexp::Common. Some of the proposed tests are already part of Regexp::Common, like the numeric tests (integers, reals) and the IP addresses. Some URI classes are in there as well (http, ftp, tel, fax and tv at the moment).

But doing validation is a lot harder than you think. You need to find authorative documentation (there are many, many URI schemes, there are only a few URI schemes that have RFC that aren't either ambiguous, unclear from which conflicting RFCs they import terms, or defined in superceeded RFCs - but not in the superseeding RFC itself. A lot of schemes are only documented in internet drafts, of which the latest has expired years ago), and regexes are hard to test right. You have to consider lots of cases, and combinations of cases, and also a lot of cases where the regex should fail. Two weeks ago, I redid the test suite for http URIs, which is actually one of the better defined URI schemes, and it took me two full nights to get it all working. It did turn up a few bugs as well.

I've wanted to add dates to Regexp::Common for quite some time as well, but were do you start? There are so many forms to choose from. Perhaps start with dates in ISO format? It sounds simple, until you actually read the 33 page specification.

Email addresses.... Once, they will be part of Regexp::Common. I've done them using Parse::RecDescent (in RFC::RFC822::Address), and it won't be a pretty regex, as it will be recursive. I haven't had the guts to do this beast yet.

I don't think valid credit card numbers would be hard - but I lack their specification. If you can provide me with it, I'll add it to Regexp::Common (but the spec should be better than "a 14 digit number").

Send me regexes and specifications, preferably with an extensive test suite, and I'll add it to Regexp::Common (current version: 2.104, 87 patterns in 11 classes, 156778 tests in 30 files).

Abigail

Replies are listed 'Best First'.
Re: Re: Data Validation Tests
by Cody Pendant (Prior) on Jan 27, 2003 at 00:28 UTC
    >the spec should be better than "a 14 digit number"

    Actually even that would be wrong.

    The spec for credit cards includes 13- to 16-digit numbers as well.

    The spec would be, roughly, "a 13-to-16-digit Luhn number beginning with one of a list of prefixes."

    There's an article about it here and a Perl implementation for checking here
    --
    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.” M-J D