Hey folks,
My database currently has just over 50,000 equipment records. A couple thousand of those are duplicated in some fashion or another. What I need to do is to identify those records. All was going well and I actually got the records identified using an array of arrays. A little about my solution so far: I extracted all serial number / id field pairs, then I removed all non word characters, and some other tokens that I knew would be a problem. What I got was a text file that had the clean serial number and all id fields on the same line, which is exactly what I wanted. Then I got to thinking, "Man it would be nice if I had the rest of the information so that I don't have to individually look these up." Sounded pretty easy, I decided to use a hash of arrays of arrays, this is where it got dicey. My code
for my $row ( @{$serials}) {
my $equ = $$row[$equIndex];
my $pmf = $$row[$pmfIndex];
my $pro = $$row[$proIndex];
my $serial = $$row[$serialIndex];
my $usr = $$row[$usrIndex];
my $date = $$row[$dateIndex];
my $clean = $$row[$cleanIndex];
if ($duplicates{$clean}) {
push (@{$duplicates{$clean}}, [$equ, $pmf, $pro, $seri
+al, $usr, $date]);
} else {
%duplicates = ($clean => [$equ, $pmf, $pro, $serial, $usr,
+ $date]);
}
}
This is the offending snippet. What this does is create a hash key from the clean serial. The first value is created as an array, which is fine, but when a duplicate comes around I push it on and it adds it to the end of the first array, inside the array. What I want is:
clean1 -> [[equ info] -> [equ info]]
clean2 -> [equ info]
clean3 -> [[equ info] -> [equ info] -> [equ info]]
so I can then print everything with an outer array length greater than 1 to a file and only get the duplicates.
What I am getting is
clean1 -> [equ info, array]
clean2 -> [equ info]
clean3 -> [equ info, array, array]
I have tried pushing the values into arrays first. I tried using two arrays and pushing the values onto 1 and then push that array onto another. What I do know is that I am making this much harder than it is, but I am stumped.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.