expo has asked for the wisdom of the Perl Monks concerning the following question:
Dataset A A "Monday" B "Tuesday" C "Wednesday" D "Thursday" Dataset B M 252 212 "Bill" A 325 908 "Jim" C 426 907 "Mike" A 423 383 "Sally" A 993 421 "Jim" C 737 432 "Mary"
The goal would be to merge these together such that redundant ids (first fields) are included and those not present in Dataset A are excluded. So the merging and filtering of the data would look something like this:
A "Monday" 325 908 "Jim" A "Monday" 423 383 "Sally" A "Monday" 993 421 "Jim" C "Wednesday" 737 432 "Mary" C "Wednesday" 737 432 "Mary"
Now, I could easily iterate through two arrays side by side and do a pattern match BUT the problem is speed. I have enormous amounts of data that I need to mine through so it needs to be pretty fast.
I started making a hash table but you need a unique id which is problematic the keys need to be unique and I am interested redundant matches. I started building a matrix using anonymous arrays but it started getting clumsy and I know there is a more elegant way to do this.
Any ideas or suggestions would be greatly appreciated!! Expo
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Data Matching Challenge
by GrandFather (Saint) on Feb 01, 2007 at 20:59 UTC | |
by expo (Initiate) on Feb 02, 2007 at 02:23 UTC | |
|
Re: Data Matching Challenge
by Tanktalus (Canon) on Feb 01, 2007 at 20:26 UTC | |
|
Re: Data Matching Challenge
by jwkrahn (Abbot) on Feb 02, 2007 at 02:10 UTC | |
|
Re: Data Matching Challenge
by roboticus (Chancellor) on Feb 02, 2007 at 03:09 UTC | |
by expo (Initiate) on Feb 03, 2007 at 14:44 UTC | |
|
Re: Data Matching Challenge
by runrig (Abbot) on Feb 01, 2007 at 20:20 UTC |