Thanks for all the feedback provided, I have made several changes that were recommended. Pl review.
liverpole: I am surprised that it would not compile for you, I did test it before posting it.
Thanks again to everyone, for your time:
#!/usr/local/bin/perl -w
use strict;
#
# http://www.dwoptimize.com/2007/05/data-type-validation-using-regular
+.html
# jag.singh@dwoptimize.com
# Version 0.2
# feedback incorporated from http://www.perlmonks.org/?node_id=61769
+2
#
my $syntax = ">
> Syntax: $0 <spec_file> <data_in_file> <data_out_file> <log_file> <ma
+x_errors>
>
> This program:
>
> 1. Reads specification file that defines the data file layout:
> 1.1. attribute name
> 1.2. attribute data type using regular expressions
> (http://www.perl.com/pub/a/2000/11/begperl3.html)
> 1.3. default data value
>
> 2. Validates input data file for data type
>
> 3. Creates output data file, with bad attribute data values that do
+not match
> specifiction replaced by the default values
>
> 4. Creates a log file describing data validation errors
>
> 5. Aborts data validation process, if total number of errors reach m
+ax_errors
>
> Terminating... ";
#
my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors)
+ = @ARGV;
die $syntax, "not all the required command line parameters are provide
+d"
unless defined $max_errors;
open (log_file, ">", $log_file) or die $syntax,
"cannot open ", $log_file, ". ", $!;
open (spec_file, $spec_file) or die $syntax,
"cannot open ", $spec_file, ". ", $!;
open (data_in_file, $data_in_file) or die $syntax,
"cannot open ", $data_in_file, ". ", $!;
open (data_out_file, ">", $data_out_file) or die $syntax,
"cannot open ", $data_out_file, ". ", $!;
print log_file "Spec File> ", $spec_file, "\n",
"Data In File> ", $data_in_file, "\n",
"Data Out File> ", $data_out_file, "\n",
"Log File> ", $log_file, "\n",
"Max errors: ", $max_errors, "\n";
#
my $spec_line; my @spec_one_attribute; my @spec_all_attributes;
foreach $spec_line (<spec_file>) {
# Read full data file specification into memory structure from the s
+pec file
# which will be used for "lookup" during data validation later
chomp ($spec_line); # remove newline
@spec_one_attribute = split(/\,/, $spec_line);
# the spec file is ',' delimited
# @spec_one_attribute contains: attribute name,
# attribute data type (regular expression), and default value
push (@spec_all_attributes, [@spec_one_attribute]);
# @spec_all_attributes contain the full data file specification
}
#
my $data_in_line; my @data_in_attributes; my $line_number = 1;
my $total_errors = 0;
DATALINE: foreach $data_in_line (<data_in_file>) {
# read data file, line by line
chomp ($data_in_line); # remove newline
@data_in_attributes = split (/\|/, $data_in_line);
# the data file is '|' delimited
if ($#data_in_attributes != $#spec_all_attributes) {
# number of attributes on the data line do not match with the spec
+ification
$total_errors++;
print log_file "Error ", $total_errors, ". On the data line: ",
$line_number, ", # attributes: ", $#data_in_attributes + 1,
", do not match # attributes in the file specification: ",
$#spec_all_attributes + 1, "\n";
last DATALINE if ($total_errors >= $max_errors);
next; # skip data attribute type validation
}
my $attribute; my $attribute_position = 0; my @data_out_attributes =
+ ();
foreach $attribute (@data_in_attributes) {
if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/)
+ {
# validate data attribute type by performing
# lookup for the regular expression from the spec memory structu
+re
push (@data_out_attributes, $attribute);
# Correct data type, the output value is same as input value
} else {
push (@data_out_attributes, $spec_all_attributes[$attribute_posi
+tion][2]);
# Bad data type, use default provided in the spec for output v
+alue
$total_errors++;
print log_file "Error ", $total_errors,
". Data type error on line: ", $line_number, ",
attribute: ", $attribute_position + 1,
" (", $spec_all_attributes[$attribute_position][0], ")\n";
}
last DATALINE if ($total_errors >= $max_errors);
$attribute_position++;
}
print data_out_file join ("|", @data_out_attributes), "\n";
# the data out file is '|' delimited
} continue {
# update line number counter even if the data attribute type
# validation is skipped
$line_number++;
}
#
if ($total_errors >= $max_errors) {
print log_file "Max error count reached: ", $total_errors, ", proces
+s terminated\n";
} else {
print log_file "Process completed with: ", $total_errors, " errors\n
+";
}
# End
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.