I would like to have all of those tests in Regexp::Common. Some of the proposed tests are already part of Regexp::Common, like the numeric tests (integers, reals) and the IP addresses. Some URI classes are in there as well (http, ftp, tel, fax and tv at the moment).

But doing validation is a lot harder than you think. You need to find authorative documentation (there are many, many URI schemes, there are only a few URI schemes that have RFC that aren't either ambiguous, unclear from which conflicting RFCs they import terms, or defined in superceeded RFCs - but not in the superseeding RFC itself. A lot of schemes are only documented in internet drafts, of which the latest has expired years ago), and regexes are hard to test right. You have to consider lots of cases, and combinations of cases, and also a lot of cases where the regex should fail. Two weeks ago, I redid the test suite for http URIs, which is actually one of the better defined URI schemes, and it took me two full nights to get it all working. It did turn up a few bugs as well.

I've wanted to add dates to Regexp::Common for quite some time as well, but were do you start? There are so many forms to choose from. Perhaps start with dates in ISO format? It sounds simple, until you actually read the 33 page specification.

Email addresses.... Once, they will be part of Regexp::Common. I've done them using Parse::RecDescent (in RFC::RFC822::Address), and it won't be a pretty regex, as it will be recursive. I haven't had the guts to do this beast yet.

I don't think valid credit card numbers would be hard - but I lack their specification. If you can provide me with it, I'll add it to Regexp::Common (but the spec should be better than "a 14 digit number").

Send me regexes and specifications, preferably with an extensive test suite, and I'll add it to Regexp::Common (current version: 2.104, 87 patterns in 11 classes, 156778 tests in 30 files).

Abigail


In reply to Re: Data Validation Tests by Abigail-II
in thread Data Validation Tests by Flame

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.