Since I am a PERL newbie, I thought of seeking wisdom before inadvertently converting my original problem into an X-Y problem!
I have a master list with names in 1st column followed by 2 more columns, with numbers, 1st number smaller than the 2nd. The names can be repeated more than once in this list, and are not sorted in any order. The two numbers associated with a name in each row can be different when the names are repeated, but not necessarily. Like so
Alex 3 44
Barry 2 44
James 6 45
Drew 9 43
Alex 124 175
Though it may be obvious, there may only be ONE master list or file name, the first element in @ARGV from $bash
Then I have multiple secondary files (could be just one to several, dont know a priori)- in the same format as the master file list, i.e also containing names that can be the same list OR more commonly a subset of the names in the master list. So these files also have 3 columns, 1st column with a name, followed by 2 columns of numbers, 1st one smaller than the 2nd. For example, the 1st secondary file's contents could be, in no particular alphabetical or numerical order:
James 1 22
Alex 89 120
Alex 134 155
Barry 12 24
While the 2nd secondary file's contents could be likewise:
Alex 154 174
James 29 45
Drew 19 54
Drew 139 154
My final output needs to contain the following information in a grid form
For each name from primary file, IF is present in the secondary files, AND when the secondary numerical range is equal to or within the primary's numerical range, indicate as present, and include secondary numerical range numbers.
Else fields for name should be indicated as absent, and range start and end filled with zeroes or just left empty.
Based on the rules above, my output should look as below with some sort of informative headers for the output columns that I casually made up:
name 1'start 1'end file#1 #1start #1end file#2 #2start #2end
Alex 3 44 absent 0 0 absent 0 0
Barry 2 44 present 12 24 absent 0 0
James 6 45 present 1 22 present 29 45
Drew 9 43 absent 0 0 absent 0 0
Alex 124 175 present 134 155 present 154 174
Dear Monks - How should I go about doing this? This problem is a little too tricky for me because of the repetitive nature of names combined the possibility of their different numerical ranges for each occurrence of the repeated name. This means that I might mistakenly try to match the wrong secondary range to the primary range, and conclude that a match does NOT exist, when reality I have compared ranges that should NOT have been compared, and should have instead looked for the numerical range of other instance(s) of the name. Does that sort of make sense? Perhaps I am obfuscating by typing more than I should....
Thanks in advance for your advice, have a nice weekend!
In reply to multi column multi file comparison by onlyIDleft
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |