PoorLuzer has asked for the wisdom of the Perl Monks concerning the following question:
Fact 1 : We have some data files produced by a legacy system
Fact 2 : We have some data files produced by a new system that should eventually replace the legacy one
Fact 3 :
Fact 4 : As and when we iterate though building the new system, we would need to compare the files produced by both systems under exact same conditions and reconcile the differences.
Fact 5 : This comparison is being done manually using an expensive visual diff tool. To help in this, I wrote a tool that brings the two different fieldnames into a common name and then sorts the field names in each record, in each file, so that they sync in order (new files can have extra fields that is ignored in the visual diff)
Fact 6 : Due to the comparison being done manually by humans, and human making mistakes, we are getting false posetives AND negatives that is significantly impacting our timelines.
Obviously the question is, what should 'ALG' and 'DS' be?
The scenario I have to address :
I want to build a PERL program that will
DS : Multiple nested hash tied to disk.
Looks like:
$namedHash { unique field value across both records } = { legacy_system => { 'goodField' => 'I am good!', 'firstField' => 1, 'secondField' => 3 }, new_system => { 'firstField' => 11, 'secondField' => 33, 'goodField' => 'I am good!' } };
ALG : Custom key - by key comparison between anonymous hashes pointed to by legacy_system and new_system keys. Any differences will be noted down by inserting a new key 'differences' that will be an array of field names that differ between legacy and new system.
Hence, for this example, the output of my ALG will be:
What would you have done/suggest in this given scenario?$namedHash { unique field value across both records } = { legacy_system => { 'goodField' => 'I am good!', 'firstField' => 1, 'secondField' => 3 }, new_system => { 'firstField' => 11, 'secondField' => 33, 'goodField' => 'I am good!' }, differences => [firstField, secondField]; };
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Comparing records in file and reporting stats - Scenario 2
by jorgegv (Novice) on May 21, 2009 at 16:08 UTC | |
|
Re: Comparing records in file and reporting stats - Scenario 2
by ig (Vicar) on May 22, 2009 at 03:20 UTC |