Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
    "Any input to make it better, or do differently..."

First of all, when I run your program, I get errors:

syntax error at validate.pl line 29, near "() " syntax error at validate.pl line 35, near "}" Execution of validate.pl aborted due to compilation errors.

But I also recommend you get in the habit of using strict in your programs (you are already using -w, to turn on warnings, which is good):

#!/usr/local/bin/perl -w use strict; use warnings; # Already on with -w, but doesn't hurt to be explicit # # Version 0.1 #

At which point you will get a couple dozen warnings:

Variable "$log_file" is not imported at validate.pl line 25. Variable "$spec_file" is not imported at validate.pl line 26. Variable "$data_in_file" is not imported at validate.pl line 27. Variable "$data_out_file" is not imported at validate.pl line 28. Variable "$spec_file" is not imported at validate.pl line 29. Variable "$data_in_file" is not imported at validate.pl line 29. Variable "$data_out_file" is not imported at validate.pl line 30. Variable "$log_file" is not imported at validate.pl line 30. Global symbol "$spec_file" requires explicit package name at validate. +pl line 24. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 24. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 24. Global symbol "$log_file" requires explicit package name at validate.p +l line 24. Global symbol "$max_errors" requires explicit package name at validate +.pl line 24. Global symbol "$log_file" requires explicit package name at validate.p +l line 25. Global symbol "$log_file" requires explicit package name at validate.p +l line 25. Global symbol "$spec_file" requires explicit package name at validate. +pl line 26. Global symbol "$spec_file" requires explicit package name at validate. +pl line 26. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 27. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 27. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 28. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 28. Global symbol "$spec_file" requires explicit package name at validate. +pl line 29. Global symbol "$data_in_file" requires explicit package name at valida +te.pl line 29. Global symbol "$data_out_file" requires explicit package name at valid +ate.pl line 30. Global symbol "$log_file" requires explicit package name at validate.p +l line 30. Global symbol "$max_errors" requires explicit package name at validate +.pl line 30. Global symbol "$spec_line" requires explicit package name at validate. +pl line 32. syntax error at validate.pl line 32, near "() " validate.pl has too many errors.

It appears that almost all of the errors in your program are caused by not declaring your variables.  In many cases this is easily fixed by using my to declare the variable.  For example:

my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors) + = @ARGV;

which will get rid of many of the errors.

In a few cases, you may have to declare the offending variable globally, in order to have it remain in scope everywhere it's used.  One such example is @spec_all_attributes; the first time you use it is within a foreach loop, so you should declare @spec_all_attributes before that loop.

When you open a file, you can avoid the warnings by doing this:

# Note that you don't have to put quotes "..." around $spec_file open ($spec_file, $spec_file) or die "Can not open file $spec_file, $! +";

It's considered good practice to use the 3-argument form of open -- for example:

open ($log_file, ">", $log_file) or die "Can not open file $log_file, +$!";

Addionally, you might consider giving a syntax message if the number of command-line arguments isn't what's expected.  This isn't always just for others who use your program; you may come back to it months or years later, and wonder what the calling syntax was supposed to be.  For example, I'd be inclined to do something like the following:

my $syntax = " syntax: $0 <specfile> <data_in> <data_out> <logfile> <max errors +> The purpose of this program is ... "; (my $spec_file = shift) or die $syntax; (my $data_in_file = shift) or die $syntax; (my $data_out_file = shift) or die $syntax; # etc...

One final comment on the style -- it's usually considered unnecessary "noise" in a program to use comments which are obvious.  The classic example is:

$i++; # Increment $i (duh!)

So you may want to lighten up a little on comments which don't add anything, and as your comments make the code very hard to read (at least for me), you may want to rethink your commenting style.  Try putting comments on their own lines (rather than making lines longer than 80 characters even more long), and add some whitespace where it helps the readability.

Thus, I'd suggest, instead of:

foreach my $attribute (@data_in_attributes) { if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/) { # + validate data attribute type by performing # lookup for the regular expression from the spec memory structure push (@data_out_attributes, $attribute); # Correct data type, the ou +tput value is same as input value } else { push (@data_out_attributes, $spec_all_attributes[$attribute_position +][2]); # Bad data type, use default provided in the spec for output value $total_errors++; print log_file "Error ", $total_errors, ". Data type error on line: +", $line_number, ", attribute: ", $attribute_position + 1, " (", $spec_all_attribut +es[$attribute_position][0], ")\n"; } last DATALINE if ($total_errors >= $max_errors); # terminate if too ma +ny errors $attribute_position++; }

that something like the following may be a lot easier to read:

foreach my $attribute (@data_in_attributes) { # validate data attribute type by performing # lookup for the regular expression from the spec memory structure if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/) + { # Correct data type, the output value is same as input value push @data_out_attributes, $attribute; } else { # Bad data type, use default provided in the spec for output v +alue push @data_out_attributes, $spec_all_attributes[$attribute_pos +ition][2]; $total_errors++; print log_file "Error ", $total_errors, ". Data type error on line: ", $line_number, ", attribute: ", $attribute_position + 1, " (", $spec_all_attributes[$attribute_position][0], ")\n"; } # terminate if too many errors (<-- but perhaps this is obvious??) last DATALINE if ($total_errors >= $max_errors); $attribute_position++; }

s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

In reply to Re: Data type validation using regular expressions by liverpole
in thread Data type validation using regular expressions by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-03-29 11:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found