"
Any input to make it better, or do differently..."
First of all, when I run your program, I get errors:
syntax error at validate.pl line 29, near "() "
syntax error at validate.pl line 35, near "}"
Execution of validate.pl aborted due to compilation errors.
But I also recommend you get in the habit of using strict in your programs (you are already using -w, to turn on warnings, which is good):
#!/usr/local/bin/perl -w
use strict;
use warnings; # Already on with -w, but doesn't hurt to be explicit
#
# Version 0.1
#
At which point you will get a couple dozen warnings:
Variable "$log_file" is not imported at validate.pl line 25.
Variable "$spec_file" is not imported at validate.pl line 26.
Variable "$data_in_file" is not imported at validate.pl line 27.
Variable "$data_out_file" is not imported at validate.pl line 28.
Variable "$spec_file" is not imported at validate.pl line 29.
Variable "$data_in_file" is not imported at validate.pl line 29.
Variable "$data_out_file" is not imported at validate.pl line 30.
Variable "$log_file" is not imported at validate.pl line 30.
Global symbol "$spec_file" requires explicit package name at validate.
+pl line 24.
Global symbol "$data_in_file" requires explicit package name at valida
+te.pl line 24.
Global symbol "$data_out_file" requires explicit package name at valid
+ate.pl line 24.
Global symbol "$log_file" requires explicit package name at validate.p
+l line 24.
Global symbol "$max_errors" requires explicit package name at validate
+.pl line 24.
Global symbol "$log_file" requires explicit package name at validate.p
+l line 25.
Global symbol "$log_file" requires explicit package name at validate.p
+l line 25.
Global symbol "$spec_file" requires explicit package name at validate.
+pl line 26.
Global symbol "$spec_file" requires explicit package name at validate.
+pl line 26.
Global symbol "$data_in_file" requires explicit package name at valida
+te.pl line 27.
Global symbol "$data_in_file" requires explicit package name at valida
+te.pl line 27.
Global symbol "$data_out_file" requires explicit package name at valid
+ate.pl line 28.
Global symbol "$data_out_file" requires explicit package name at valid
+ate.pl line 28.
Global symbol "$spec_file" requires explicit package name at validate.
+pl line 29.
Global symbol "$data_in_file" requires explicit package name at valida
+te.pl line 29.
Global symbol "$data_out_file" requires explicit package name at valid
+ate.pl line 30.
Global symbol "$log_file" requires explicit package name at validate.p
+l line 30.
Global symbol "$max_errors" requires explicit package name at validate
+.pl line 30.
Global symbol "$spec_line" requires explicit package name at validate.
+pl line 32.
syntax error at validate.pl line 32, near "() "
validate.pl has too many errors.
It appears that almost all of the errors in your program are caused by not declaring your variables. In many cases this is easily fixed by using my to declare the variable. For example:
my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors)
+ = @ARGV;
which will get rid of many of the errors.
In a few cases, you may have to declare the offending variable globally, in order to have it remain in scope everywhere it's used. One such example is @spec_all_attributes; the first time you use it is within a foreach loop, so you should declare @spec_all_attributes before that loop.
When you open a file, you can avoid the warnings by doing this:
# Note that you don't have to put quotes "..." around $spec_file
open ($spec_file, $spec_file) or die "Can not open file $spec_file, $!
+";
It's considered good practice to use the 3-argument form of open -- for example:
open ($log_file, ">", $log_file) or die "Can not open file $log_file,
+$!";
Addionally, you might consider giving a syntax message if the number of command-line arguments isn't what's expected. This isn't always just for others who use your program; you may come back to it months or years later, and wonder what the calling syntax was supposed to be. For example, I'd be inclined to do something like the following:
my $syntax = "
syntax: $0 <specfile> <data_in> <data_out> <logfile> <max errors
+>
The purpose of this program is ...
";
(my $spec_file = shift) or die $syntax;
(my $data_in_file = shift) or die $syntax;
(my $data_out_file = shift) or die $syntax;
# etc...
One final comment on the style -- it's usually considered unnecessary "noise" in a program to use comments which are obvious. The classic example is:
$i++; # Increment $i (duh!)
So you may want to lighten up a little on comments which don't add anything, and as your comments make the code very hard to read (at least for me), you may want to rethink your commenting style. Try putting comments on their own lines (rather than making lines longer than 80 characters even more long), and add some whitespace where it helps the readability.
Thus, I'd suggest, instead of:
foreach my $attribute (@data_in_attributes) {
if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/) { #
+ validate data attribute type by performing
# lookup for the regular expression from the spec memory structure
push (@data_out_attributes, $attribute); # Correct data type, the ou
+tput value is same as input value
} else {
push (@data_out_attributes, $spec_all_attributes[$attribute_position
+][2]);
# Bad data type, use default provided in the spec for output value
$total_errors++;
print log_file "Error ", $total_errors, ". Data type error on line:
+", $line_number,
", attribute: ", $attribute_position + 1, " (", $spec_all_attribut
+es[$attribute_position][0], ")\n";
}
last DATALINE if ($total_errors >= $max_errors); # terminate if too ma
+ny errors
$attribute_position++;
}
that something like the following may be a lot easier to read:
foreach my $attribute (@data_in_attributes) {
# validate data attribute type by performing
# lookup for the regular expression from the spec memory structure
if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/)
+ {
# Correct data type, the output value is same as input value
push @data_out_attributes, $attribute;
} else {
# Bad data type, use default provided in the spec for output v
+alue
push @data_out_attributes, $spec_all_attributes[$attribute_pos
+ition][2];
$total_errors++;
print log_file "Error ", $total_errors,
". Data type error on line: ", $line_number,
", attribute: ", $attribute_position + 1,
" (", $spec_all_attributes[$attribute_position][0], ")\n";
}
# terminate if too many errors (<-- but perhaps this is obvious??)
last DATALINE if ($total_errors >= $max_errors);
$attribute_position++;
}
s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/