|
This is the first Perl script I ever wrote. Any input to make it better, or do differently (using some existing model, for example) highly appreciated.
The simple perl script proof of concept demonstrated here, performs data type validation on the provided data file. The script creates an output file with bad data attributes substituted by default values. The data type specification and default values are read from a specification file.
Usage example:
Consider sales_payment.dat file:
A|10.50|CC|2006/12/05|10:05:15
2|12A|Cash|2006/12/05|10:12:18
3|100|12 Un|2006/12/05|10:15:23
4|.85|A1|2006/12/05|10:18:00
5|-100|B2|2006/12/05|10:20:00
6||C|2006/12/05|10:22:00
7|100||2006/12/05|10:26:00
8|200|D|2006/02/31|10:32:00
9|2006/02/31|10:33:00
10|400|E|2006/03/40|30:35:00
11|400|F|1234|10:41:AA
10|300|G|2006/02/31|10:05:15
A specification file sales_payment.spec is created; the file contains metadata - data attribute name, attribute data type defined using regular expressions, and default data value that is used when the data file contains bad data - separated by commas (','):
transaction_number,^\d+$,-1
total_basket_amount,^[-+]?[0-9]*\.?[0-9]+$,0
payment_type,^\w$,_Unknown
date,(19|20)\d\d[/](0[1-9]|1[012])[/](0[1-9]|[12][0-9]|3[01]),1900/01/
+01
time,^([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$,00:00:00
When we run data validation:
unix> perl validate_data_type.pl "sales_payment.spec" "sales_payment.d
+at" "sales_payment_out.dat" "sales_payment.log" 50
...the output data file is created:
-1|10.50|CC|2006/12/05|10:05:15
2|0|Cash|2006/12/05|10:12:18
3|100|_Unknown|2006/12/05|10:15:23
4|.85|A1|2006/12/05|10:18:00
5|-100|B2|2006/12/05|10:20:00
6|0|C|2006/12/05|10:22:00
7|100|_Unknown|2006/12/05|10:26:00
8|200|D|2006/02/31|10:32:00
10|400|E|1900/01/01|00:00:00
11|400|F|1900/01/01|00:00:00
10|300|G|2006/02/31|10:05:15
...along with a log file:
Spec File> sales_payment.spec
Data In File> sales_payment.dat
Data Out File> sales_payment_out.dat
Log File> sales_payment.log
Max errors: 50
Error 1. Data type error on line: 1, attribute: 1 (transaction_number)
Error 2. Data type error on line: 2, attribute: 2 (total_basket_amount
+)
Error 3. Data type error on line: 3, attribute: 3 (payment_type)
Error 4. Data type error on line: 6, attribute: 2 (total_basket_amount
+)
Error 5. Data type error on line: 7, attribute: 3 (payment_type)
Error 6. On the data line: 9, # attributes: 3, do not match # attribut
+es in the file specification: 5
Error 7. Data type error on line: 10, attribute: 4 (date)
Error 8. Data type error on line: 10, attribute: 5 (time)
Error 9. Data type error on line: 11, attribute: 4 (date)
Error 10. Data type error on line: 11, attribute: 5 (time)
Process completed with: 10 errors
Formatted documentation available at: <a href=http://www.dwoptimize.co
+m/2007/05/data-type-validation-using-regular.html>www.dwoptimize.com<
+/a>
|