My only experience with Perl is reading the first 11 chapters of Learning Perl 3rd edition last weekend. I'm running ActiveState's win32 5.005_03 version of Perl. I've worked out many exercises from Learning Perl, but I'm having trouble figuring out how to start (and finish) the following:
Take a tab separated text file with approximately 100,000 lines. Each line has 5 fields: sampleID, subID, testtype, result, resultval. For example:
12345 543D3 17 1 12.3
12345 543D3 17 2 9
12345 543D3 18 1 17.2
45678 543D3 17 1 12.3
45678 543D3 17 2 9
67890 775G2 17 1 12.3
67890 775G2 17 2 9
67890 775G2 18 1 17.2
I would like to transform the file to the following:
12345 543D3 17 1 12.3 17 2 9 18 1 17.2
45678 543D3 17 1 12.3 17 2 9
67890 775G2 17 1 12.3 17 2 9 18 1 17.2
Then I would like to perform pairwise line comparisons in the transformed file to determine if all of the test results of two lines are the same. All three lines in the transformed file match at all test results determined in the example above. Of course there would be many lines in the transformed file that don't match the test results of lines 1 - 3.
Example report:
sampleID test results match: 12345, 45678, 67890
sampleID test results match: xxxxx, yyyyy
subID test results match: 543D3
subID test results match: xxxAx
Where to start: I've learned enough to open and read each line in a file (to print it, add to array or hash, etc). I've learned how to use simple regular expressions to write out a new file with all lines that match a specific string. The leap I need to make is how to takes several lines from a file and write a new single line (the lines would each have the same value in the sampleID field), and how to perform pairwise comparisons of one line in a file against all other lines in a file (and then take the second line and compare against all other lines, etc).
BTW this is not a homework problem. Where should I start? A particular tutorial or manpage? Any code example would be truly appreciated.
Respectfully,
yungGH
Edited by mirod, 2003-02-13: changed the title