comment on

Thanks for all the feedback provided, I have made several changes that were recommended. Pl review.
liverpole: I am surprised that it would not compile for you, I did test it before posting it.
Thanks again to everyone, for your time:

#!/usr/local/bin/perl -w
use strict;
#
# http://www.dwoptimize.com/2007/05/data-type-validation-using-regular
+.html
# jag.singh@dwoptimize.com
# Version 0.2 
#   feedback incorporated from http://www.perlmonks.org/?node_id=61769
+2
#
my $syntax = ">
> Syntax: $0 <spec_file> <data_in_file> <data_out_file> <log_file> <ma
+x_errors>
>
> This program:
>
> 1. Reads specification file that defines the data file layout:
> 1.1. attribute name
> 1.2. attribute data type using regular expressions 
>      (http://www.perl.com/pub/a/2000/11/begperl3.html)
> 1.3. default data value 
>
> 2. Validates input data file for data type
>
> 3. Creates output data file, with bad attribute data values that do 
+not match
>    specifiction replaced by the default values
>
> 4. Creates a log file describing data validation errors
>
> 5. Aborts data validation process, if total number of errors reach m
+ax_errors
>
> Terminating... ";
#
my ($spec_file, $data_in_file, $data_out_file, $log_file, $max_errors)
+ = @ARGV;
die $syntax, "not all the required command line parameters are provide
+d" 
  unless defined $max_errors;
open (log_file, ">", $log_file) or die $syntax, 
  "cannot open ", $log_file, ". ", $!;
open (spec_file, $spec_file) or die $syntax, 
  "cannot open ", $spec_file, ". ", $!;
open (data_in_file, $data_in_file) or die $syntax, 
  "cannot open ", $data_in_file, ". ", $!;
open (data_out_file, ">", $data_out_file) or die $syntax, 
  "cannot open ", $data_out_file, ". ", $!;
print log_file "Spec File> ", $spec_file, "\n", 
  "Data In File> ", $data_in_file, "\n", 
  "Data Out File> ", $data_out_file, "\n", 
  "Log File> ", $log_file, "\n", 
  "Max errors: ", $max_errors, "\n";
#
my $spec_line; my @spec_one_attribute; my @spec_all_attributes; 
foreach $spec_line (<spec_file>) { 
  # Read full data file specification into memory structure from the s
+pec file
  # which will be used for "lookup" during data validation later
  chomp ($spec_line); # remove newline
  @spec_one_attribute = split(/\,/, $spec_line); 
    # the spec file is ',' delimited
    # @spec_one_attribute contains: attribute name, 
    # attribute data type (regular expression), and default value
  push (@spec_all_attributes, [@spec_one_attribute]); 
    # @spec_all_attributes contain the full data file specification
}
#
my $data_in_line; my @data_in_attributes; my $line_number = 1; 
my $total_errors = 0;
DATALINE: foreach $data_in_line (<data_in_file>) { 
  # read data file, line by line
  chomp ($data_in_line); # remove newline
  @data_in_attributes = split (/\|/, $data_in_line); 
    # the data file is '|' delimited
  if ($#data_in_attributes != $#spec_all_attributes) { 
    # number of attributes on the data line do not match with the spec
+ification
    $total_errors++;
    print log_file "Error ", $total_errors, ". On the data line: ", 
      $line_number, ", # attributes: ", $#data_in_attributes + 1, 
      ", do not match # attributes in the file specification: ", 
      $#spec_all_attributes + 1, "\n";
    last DATALINE if ($total_errors >= $max_errors); 
    next; # skip data attribute type validation
  }      
  my $attribute; my $attribute_position = 0; my @data_out_attributes =
+ ();
  foreach $attribute (@data_in_attributes) {
    if ($attribute =~ m/$spec_all_attributes[$attribute_position][1]/)
+ { 
      # validate data attribute type by performing 
      # lookup for the regular expression from the spec memory structu
+re
      push (@data_out_attributes, $attribute); 
        # Correct data type, the output value is same as input value
    } else {
      push (@data_out_attributes, $spec_all_attributes[$attribute_posi
+tion][2]); 
        # Bad data type, use default provided in the spec for output v
+alue
      $total_errors++;
      print log_file "Error ", $total_errors, 
        ". Data type error on line: ", $line_number, ", 
        attribute: ", $attribute_position + 1, 
        " (", $spec_all_attributes[$attribute_position][0], ")\n";
    }
    last DATALINE if ($total_errors >= $max_errors); 
    $attribute_position++;
  }
  print data_out_file join ("|", @data_out_attributes), "\n"; 
    # the data out file is '|' delimited
} continue { 
  # update line number counter even if the data attribute type 
  # validation is skipped
  $line_number++;
}
#
if ($total_errors >= $max_errors) {
  print log_file "Max error count reached: ", $total_errors, ", proces
+s terminated\n";
} else {
  print log_file "Process completed with: ", $total_errors, " errors\n
+";
}
# End
[download]

In reply to Re: Data type validation using regular expressions by Anonymous Monk
in thread Data type validation using regular expressions by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Pathologically Eclectic Rubbish Lister
	PerlMonks