in reply to Reassembling lines and comparing them (was: First Post)
As the data you wish to compare is the accumulated (testtype, result & resultval) and the information you wish to extract are the sampleID's and subID's, possibly the best method would be to build a hash of array's of arrays (HoAoA) as you read your data in. The key would be the accumulated test data, and the array of arrays would contain one array of sampleID's and one of subID's. Doing it this way, there is no searching to be done once the file has been read in and the structure built. For your sample data this might look like this
%data = { '17 1 12.3 17 2 9 18 1 17.2' => [ [ 12345, 67890], [ 543D3 +, 775G2 ] ], '17 1 12.3 17 2 9' => [ [ 45678 ], [ 543D3 ] ], }
The values for your report can then be read out of the nested arrays directly with no further searching, sorting or matching.
Assuming that your input file is as you have shown it: sorted by sampleID and each set of values follow in a consistant order, building this data structure is a simple one-pass linear affair.
[ [@sampleIDs], [@subIDs] ].
Set $prevID = $sampleID; @sampleIDs = $sampleID; @subIDs = $subID; $results = $rest;
Hopefully that pseudo code and the other answers will give you enough clues to get you going. If you get stuck, come back with what you have and someone will nudge you along:)
Examine what is said, not who speaks.
The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.
|
|---|