Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Data type validation using regular expressions

by liverpole (Monsignor)
on May 27, 2007 at 14:00 UTC ( [id://617738]=note: print w/replies, xml ) Need Help??


in reply to Data type validation using regular expressions

    "Any input to make it better, or do differently..."

First of all, when I run your program, I get errors:

syntax error at validate.pl line 29, near "() " syntax error at validate.pl line 35, near "}" Execution of validate.pl aborted due to compilation errors.

But I also recommend you get in the habit of using strict in your programs (you are already using -w, to turn on warnings, which is good):

#!/usr/local/bin/perl -w use strict; use warnings; # Already on with -w, but doesn't hurt to be explicit # # Version 0.1 #

At which point you will get a couple dozen warnings:

Variable "$log_file" is not imported at validate.pl line 25. Variable "$spec_file" is not imported at validate.pl line 26. Variable "$data_in_file" is not imported at validate.pl line 27. Variable "$data_out_file" is not imported at validate.pl line 28. Variable "$spec_file" is not imported at validate.pl line 29. Variable "$data_in_file" is not imported at validate.pl line 29. Variable "$data_out_file" is not imported at validate.pl line 30. Variable "$log_file" is not imported at validate.pl line 30. Global symbol "$spec_file" requires explicit package name at validate. +pl line 24. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 24. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 24. Global symbol "$log_file" requires explicit package name at validate.p +l line 24. Global symbol "$max_errors" requires explicit package name at validate +.pl line 24. Global symbol "$log_file" requires explicit package name at validate.p +l line 25. Global symbol "$log_file" requires explicit package name at validate.p +l line 25. Global symbol "$spec_file" requires explicit package name at validate. +pl line 26. Global symbol "$spec_file" requires explicit package name at validate. +pl line 26. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 27. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 27. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 28. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 28. Global symbol "$spec_file" requires explicit package name at validate. +pl line 29. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 29. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 30. Global symbol "$log_file" requires explicit package name at validate.p +l line 30. Global symbol "$max_errors" requires explicit package name at validate +.pl line 30. Global symbol "$spec_line" requires explicit package name at validate. +pl line 32. syntax error at validate.pl line 32, near "() " validate.pl has too many errors.

It appears that almost all of the errors in your program are caused by not declaring your variables.  In many cases this is easily fixed by using my to declare the variable.  For example:

my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors) + = @ARGV;

which will get rid of many of the errors.

In a few cases, you may have to declare the offending variable globally, in order to have it remain in scope everywhere it's used.  One such example is @spec_all_attributes; the first time you use it is within a foreach loop, so you should declare @spec_all_attributes before that loop.

When you open a file, you can avoid the warnings by doing this:

# Note that you don't have to put quotes "..." around $spec_file open ($spec_file, $spec_file) or die "Can not open file $spec_file, $! +";

It's considered good practice to use the 3-argument form of open -- for example:

open ($log_file, ">", $log_file) or die "Can not open file $log_file, +$!";

Addionally, you might consider giving a syntax message if the number of command-line arguments isn't what's expected.  This isn't always just for others who use your program; you may come back to it months or years later, and wonder what the calling syntax was supposed to be.  For example, I'd be inclined to do something like the following:

my $syntax = " syntax: $0 <specfile> <data_in> <data_out> <logfile> <max errors +> The purpose of this program is ... "; (my $spec_file = shift) or die $syntax; (my $data_in_file = shift) or die $syntax; (my $data_out_file = shift) or die $syntax; # etc...

One final comment on the style -- it's usually considered unnecessary "noise" in a program to use comments which are obvious.  The classic example is:

$i++; # Increment $i (duh!)

So you may want to lighten up a little on comments which don't add anything, and as your comments make the code very hard to read (at least for me), you may want to rethink your commenting style.  Try putting comments on their own lines (rather than making lines longer than 80 characters even more long), and add some whitespace where it helps the readability.

Thus, I'd suggest, instead of:

foreach my $attribute (@data_in_attributes) { if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/) { # + validate data attribute type by performing # lookup for the regular expression from the spec memory structure push (@data_out_attributes, $attribute); # Correct data type, the ou +tput value is same as input value } else { push (@data_out_attributes, $spec_all_attributes[$attribute_position +][2]); # Bad data type, use default provided in the spec for output value $total_errors++; print log_file "Error ", $total_errors, ". Data type error on line: +", $line_number, ", attribute: ", $attribute_position + 1, " (", $spec_all_attribut +es[$attribute_position][0], ")\n"; } last DATALINE if ($total_errors >= $max_errors); # terminate if too ma +ny errors $attribute_position++; }

that something like the following may be a lot easier to read:

foreach my $attribute (@data_in_attributes) { # validate data attribute type by performing # lookup for the regular expression from the spec memory structure if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/) + { # Correct data type, the output value is same as input value push @data_out_attributes, $attribute; } else { # Bad data type, use default provided in the spec for output v +alue push @data_out_attributes, $spec_all_attributes[$attribute_pos +ition][2]; $total_errors++; print log_file "Error ", $total_errors, ". Data type error on line: ", $line_number, ", attribute: ", $attribute_position + 1, " (", $spec_all_attributes[$attribute_position][0], ")\n"; } # terminate if too many errors (<-- but perhaps this is obvious??) last DATALINE if ($total_errors >= $max_errors); $attribute_position++; }

s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Replies are listed 'Best First'.
Re^2: Data type validation using regular expressions
by FunkyMonk (Chancellor) on May 27, 2007 at 14:59 UTC
    (my $spec_file = shift) or die $syntax; (my $data_in_file = shift) or die $syntax; (my $data_out_file = shift) or die $syntax; # etc...

    Yuk!

    die $syntax unless @ARGV == 5; my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors) + = @ARGV;

    or

    my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors) + = @ARGV; die $syntax unless defined $max_errors;

    Do not repeat yourself!

Re^2: Data type validation using regular expressions
by naikonta (Curate) on May 27, 2007 at 16:36 UTC
    use warnings; # Already on with -w, but doesn't hurt to be explicit
    How is -w less explicit than use warnings;, as far as enabling warning is concerned?

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      Maybe it's just me; I feel like it's more explicit when I see it together with strict, spelled out:
      use strict; use warnings;

      But I confess that I still don't trust -w on Windows (even though I now know it works perfectly well), because Windows ignores the first part of the shebang line.  To test this, you can do:

      #!/usr/path/which/does/not/exist/perl

      and it'll still run Perl correctly.  Granted this isn't a reason to stop using -w, which still does work as I mentioned, but it did make me suspicious of the whole top line for quite a while.


      s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
        Don't get me wrong :-) I always use warnings ever since I knew it instead of -w switch. Being a pragma gives it more power and flexibilities. I don't use Windows but I knew from the start that shebang doesn't work on it, althought Apache seems to honor the switches. However, one should take advantage of the pl2bat converter to keep his/her original switches attached to the shebang.

        Still, whenever I see -w with the shebang I won't bother to change it to use warnings; or recommend to do so (unless one needs more control on warnings), it's still equally explicit, and still perfect for me (with use strict; of course) :-)


        Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re^2: Data type validation using regular expressions
by Anonymous Monk on May 27, 2007 at 18:45 UTC
    I did figure out why you were getting those syntax errors, if you copy the code from dwoptimize. The blogger was eating some of the < and > characters even with the html "code" tag defined. Fixed now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://617738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-04-25 11:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found