Constructive criticism on the language and system please. I understand that this is not something desirable for everyone. Some believe in using eval("") for such things. Or straight perl. I am looking for features, shortcomings, questions and comments.

Verify

There are many issues with developing interfaces for users. You need to worry about frameworks for back end code and interface frameworks. One of the issues I ran into one too many times is data validation frameworks. I've found only 1 in my lifetime, but it was XML based and as tedious as writing it in my working language. I’ll refer to my working language here as BASIC for brevity. The purpose is not to delve into the capabilities and shortcomings of the working language.

The purpose of this RFC is to propose an interpreter and a language to fit that niche, both shall be refered to as Verify. Data validation is not done frequently because developers are lazy. In some languages, it can become quite verbose, which adds to the tedium. It is frequently not done as well as it is hard and/or tedious. For large systems, it becomes a long list of if/then/else statements checking for string and numeric properties.

Interpreter

The Verify interpreter should be fast to execute. If I have to import 20 thousand rows for unvalidated data from a customer, I do not wish for it to do the equivalent of a fork every time for every line. The Verify language should be parsed and the parse tree stored in memory. Access to things that change should be in turn, generic and easy. Fast to execute, fast to develop in.

The interpreter should be pluggable. It can be assumed that BASIC has features that Verify’s interpreter cannot do as quickly. Or it may wish to be plugged into other systems as well. For this reason, all functions are written BASIC. Floating point precision libraries are another example of something that may be required to be plugged in, as there are several, not completely agreeing floating point systems out there. You may wish your entire company to standardize on one. A set of plug-ins should be configured and plugged in once for many uses.

The Verify interpreter should be easily bridgeable to and from another system. It should be simple, and or similar as:

My $interpreter = Verify->new( ); $interpreter->pluggedFunctions( { "zero" => { $_[0] * 0 } } ); $interpreter->compile("/beer/something.vfy"); My %bind_variables = get_variables_from_tk_interface_put_into_hash( $t +k1 ); if $interpreter( \%bind_variables ) ne "YAY" die("EEK!"); My %bind_variables = get_variables_from_tk_interface_put_into_hash( $t +k2 ); if $interpreter( \%bind_variables ) ne "YAY" die("EEK MORE!!");

The interpreter requires a mathematical handler be used. The default supports +,-, <=, <, >, >= and the identity function for assignment. An enahnced default one would include / and * for use with BASIC's internal floating point representation. The default would throw an error should / or * be used. The purpose is because there are various floating point operation libraries in use. Many people are comfortable in BASIC's implementation, where as someone like NASA may wish to use the NASA standard one, should there be one.

Verify Language description

The Verify language is intended to be very, /very/, simple. It is for data validation, not to do fractals, reformatting of XML or anything really interesting besides validating data. It is of the same ilk of a template language or SQL language. It should not promote being used for things outside of its scope.

Features not included are

The 4 basic mathematical functions are supported, + - / *. Mathematical operations,all of them, are pluggable. The results are user-defined numerical.

The atom: a single value will be typecast accordingly. "1" gets cast to a numeric one in numeric operations, while 1 gets typecast to a string in string comparisons. The numerical castings are done via the floating point implementation used for consistency.

Binary (and trinary) use of the numeric comparison operators, < <= > >=. They can only be used on arithmetic operations. The purpose is to make it easy to do a "between" easily, i.e. 1 < x < 5. i.e. 1 + 2 < 5 + 6. The result of these operations are boolean.

Equality, to be used between numbers and strings as one would expect. "1" == 1 would be true per the casting of an atom. The resuls of these operations are boolean.

Regular expressions are supported, in the form of perl syntax, i.e. x=~/abc/, x!~/abc/. The result is a boolean.

The raw booleans, true and false are available.

Boolean algebraic operations of "or" and "and". To be used on boolean operations.

The various built in Verify language abilities of if-else, while and for are the same as the C languages syntax, including the use of block usage with braces. Return is supported to dictate levels of success or error.

Variables do not need declaration, similar to perl without using strict. They default to 0. The same is true of arrays. Referring to an array without an index refers to the first element. Array sizes do not need to be declared. They follow basic C naming rules.

Special variables, called bind variables, act exactly the same as variables, except they must be prefixed with a colon. Their values are passed in on execution of an interpretere, similar to how SQL systems work, very similar to DBI. This allows for simple syntax of, if( 1 <= :x < =10 )

"Plugged" functions are referred to as one would call a function in C. The state of all internal variables is accessible and modifiable. Use with care.

Example

A year or two ago, I worked w/ a system that directly imported data into a DB, no validation. Unfortunately, the spreadsheet was done by hand, no validation. An example of what I might have used for a validation example would be:
If( :company == "" or :stock == "" or :sku == 0 ) Return "E_ROW_INVALID"; if( checkSalesPerson( :salesperson ) == false ) return "E_INVALID_SALESPERSON";
The checkSalesPerson plugged in function may contact a database, or be preconfigured w/ a list of valid sales persons. Or it could have been done as:
if( :salesperson != "gsmith" and :salesperson != "rjohnson" ) Return "E_INVALID_SALESPERSON".
If someone wrote a plugged function to check for a person being in an array, one could write something like:
Sp[0] = "gsmith"; Sp[1] = "rsmith"; if( inArray( :salesperson, sp ) == false ) Return "E_INVALID_SALESPERSON".
Many plugged functions will be included as example and to be used in day to day operation.

-sporty

Update: Moved the description of the math handler to the interpreter section. It doesn't effect the language if any particular interpreter is used.

Replies are listed 'Best First'.
Re: RFC: Verify Interpreter and Language
by bart (Canon) on Feb 09, 2006 at 20:58 UTC
    Do you distinguish between "" and undef/null? Because in Perl, they're not exactly the same, nor are they in SQL, and in Perl they're even hard to test without warnings. I hope your system is more flexible in doing the test. I like Javascript's syntax, where you can directly compare to null.

    You said:

    Variables do not need declaration, similar to perl without using strict. They default to 0.
    Is the default "0" or "", when treated as string? Is it the same as undef/null?

    p.s. Your perl pseudocode is off. "my" is written in all lowercase, and

    $interpreter( \%bind_variables )
    needs an arrow:
    $interpreter->( \%bind_variables )
      I've thought about adding a builtin called "defined" or "isnull" that will check if a bind variable, or internal is null. I don't want to get in the mess of passing nulls around unless there's good reason. A real life valid example if it were to actually allow the assignment vs the comparison of undefs...

      I'll fix my psuedocode tomorrow. I did it in word. Stupid grammar checker. :)

        Hmm I don't seem to have made my position clear. What I mean was, occasionally in data, an empty string and NULL are equivalent. Occasionally, 0 and NULL, or 0 and "" are equivalent. Effectively testing this in Perl is a bit of a pain, and in SQL it's even a lot worse because
        SELECT * FROM mytable WHERE foo=foo
        will not show the records where foo is null. Purists might say this is how it should be, but I think it's just not very practical.

        It'd be nice if you provided a simple means to test if a value is, say, NULL, 0 or "", perhaps using something like in (as in Mysql):

        if($foo in (0, "", undef)) ...

        I'm actually not even sure that NULL is a valid value for the IN list in Mysql...

Re: RFC: Verify Interpreter and Language
by acid06 (Friar) on Feb 09, 2006 at 21:05 UTC
    Okay, I truly don't mean to be harsh or anything like that, but I really can't see the advantage of using this language instead of pure-Perl code.

    Consider how much this differs from your last example:
    $Sp[0] = "gsmith"; $Sp[1] = "rsmith"; if( ! inArray( $salesperson, \@Sp ) ) { return E_INVALID_SALESPERSON; }

    When I started reading your meditation I thought of some really nice ideas regarding parameter validation and such.

    I think that in order to write a parameter validation mini-language you should keep yourself from rewriting Perl with another syntax.

    As a matter of fact, I think that it shouldn't be a programming language but rather a sort of description language. Then you could have pluggable types or modifiers for validation. Pluggable input and output handlers. Maybe pluggable filters. You could come up with a sort of inheritance tree and so on. I think something along these lines would be more appropriate.

    However, remember this is just my opinion. Others may (and probably will) disagree with me. Others may agree. If you feel this way is what works best for you, then go for it, it's always nice to have plenty of alternatives out there on the CPAN. You asked for constructive criticism and this was my best effort at it. ;-)


    acid06
    perl -e "print pack('h*', 16369646), scalar reverse $="
      In the world of systems, you ahve two extremes, high abstraction of duty, and none. Then there's the inbetween. I looked at the problem from the php perspective. php is a templating language that you can code in. So you wind up with 3500 core functions and the ability to put anything anywhere.

      So back to your example. If I had no framework, I could create a module and various objects that work as validators on other objects. That would be a new framework, true. But it falls into the trap of being a validation language I can write code in. For this very reason, I'm not fond of many template languages and similar which allow raw code to be executed. Why would I ever wish to use DBI from this to validate my data as being of certain standards?

      So for instance, sigil's are gone in my language for data types, since a scalar in perl terms is just a 1 element array in my language, where you just don't use the index. You have trinary compares for < and > for easy betweens. You have variables taht refer to the outter world as :someVar.

      To put it in the frame of mind of an architecture. What I would do in the past, is use something like HTML::Template and CGI. I would take in all my inputs, create an object tree of things like User, Group, Message, Topic, Node, populate them with my data, and call various functions to validate each object in the object tree. Each object would be validated in the native language and returned.

      But I run into a similar problem of replacing HTML::Template w/ here-docs. Yes, I can perform invalid syntax in either, and accomplish the same in either, but the here-doc method can be very powerful. Almost TOO powerful. If I'm smart and good and all, I can accomplish a pristine system w/ no issues of what is doing what. If I'm bad and/or not as smart, I can start doing DBI calls in places I'll be rendering templates. That scares me. It's the same issue of actually using php as your template language, or JSP as a coding language.

      Heh, imagine if you wrote DB queries in perl instead of SQL.

      The advantage I propose, is the same as HTML::Template, Class::DBI and Catalyst. I can have clear seperations of duty, that are done very well, and won't comingle. The template language only makes stuff well formated, where Class::DBI takes objects and stores/retrieves them for me, where as catalyst deals w/ URIs, what is called and the clear seperation of one thing from the next. With verify, I can start out w/ nothing, and fill in the validation w/o worrying about stepping on things like Class::DBI or HTML::Template.

      BTW, mind you, I never said the underlying language /was/ perl, but even if it was, it is a very simple subset.

        The thing is that your proposed language is almost raw code. ;-)

        I think that splitting everything up in different layers of abstraction is the way to go. I wasn't criticizing the concept.

        This is completely subjective, but I tend to dislike mini-languages. E.g. although I use Template Toolkit, I'd be much happier if it used a slightly modified Perl (which the only major difference would be the removal of curly braces) instead of its own language. I even considered using Template::PSP but it really doesn't have all the features I need.

        Going back to the specific example you gave, Class::DBI and Catalyst are substantially different from hand-written SQL and plain CGI coding. While your proposed language is not.

        If I were to implement a verification language, the "code" would probably be some sort of subset of YAML (not because it would ease parsing, but because I really like YAML) and look something like this:
        # type checking parameters: # + concatenates rules, pad does smart padding flight_number: char(2) + pad(integer(0 .. 9999)) flight_type: char(1) # variable passed as a parameter to the template total_seats: integer(0 .. :total_seats) occupied_seats: integer(0 .. :total_seats) # length checks the list, "of string" could be implicit passenger_list: list(0 .. :total_seats) of string # further validation and processing processing: # & references a previously declared field # :flights_table could be a list or a hash # this might be more appropriate to be on # the previous "parameters" instead of here - exists &flight_number in :flights_table # maps the flight type code into a descriptive string # using the flight_types hash - map &flight_type into :flight_types # some simple and obvious checks - check $occupied_seats < &total_seats - check &passenger_list = &occupied_seats # in-template variable declaration declare: flight_types: A: Commercial B: Non-Commercial C: Cargo
        In fact, I actually really liked this idea. I might consider implementing it, if I have the time. ;-)


        acid06
        perl -e "print pack('h*', 16369646), scalar reverse $="
        The thing is, your example on location and associated ZIP code syntax, is one example of why a database lookup could be a good idea in a data verification language. Do you really wish to pile rule upon rule, in "code", for a single association, location<->ZIP? It seems to me like a table holding those associations would be a better idea.
Re: RFC: Verify Interpreter and Language
by snoopy (Curate) on Feb 22, 2006 at 01:11 UTC
    The perl Web application that I'm working on at the moment replicates validation logic in

    • perl functions
    • javascript field actions
    • sql constraints and triggers

    I am manually replicating rules across different languages. I'd like is to be able to state these rules at one in place using one language.

    Validation often takes you outside your perl comfort zone into other subsystems and languages. Often the same validation rules are replicated in different places using different languages. What a burden!

    In short, I kind-of like the idea of a mini-language, and an interpretor, but what would really sell it to me is the added ability to use it as a cross language/platform lowest common denominator - specify my rules once translate it into validation libraries in other languages such as javascript and sql.

      I've implemented the Verify language once. Having done so once, I've thought about translation. It's so small and simple, a lot if open to interpretation when translating it.

      Language translation is easy. Any 1 op maps to 1 or many more. Verify is simple enough, no memory allocation, type saftey and what not, it should be cake. :)

      What kind of db validation beyond unique and fk constraints are you thinking of? I'd advise against DB specific stuff if you can help it, since switching or using a product on another DB isn't your target isn't fun.

        Modules such as Class::DBI has wrecked me forever when it comes to low level grovelling through databases with SQL validating primary keys in any language...

        To warrant serious consideration, a validation language it'll need to operate at a similar level of abstraction and portability. I'd like to write...

        unless(get_Person_Company(:salesperson, :company)) Return "E_INVALID_SALESPERSON"

        ...and remain blithely unaware as to whether this is a cgi-form or an uncommitted database entry.

        The call to accesor get_Person_Company() remains the same.

        I'd agree with acid06's comments that different layers of abstraction is the way to go.