Hey folks, My database currently has just over 50,000 equipment records. A couple thousand of those are duplicated in some fashion or another. What I need to do is to identify those records. All was going well and I actually got the records identified using an array of arrays. A little about my solution so far: I extracted all serial number / id field pairs, then I removed all non word characters, and some other tokens that I knew would be a problem. What I got was a text file that had the clean serial number and all id fields on the same line, which is exactly what I wanted. Then I got to thinking, "Man it would be nice if I had the rest of the information so that I don't have to individually look these up." Sounded pretty easy, I decided to use a hash of arrays of arrays, this is where it got dicey. My code
for my $row ( @{$serials}) { my $equ = $$row[$equIndex]; my $pmf = $$row[$pmfIndex]; my $pro = $$row[$proIndex]; my $serial = $$row[$serialIndex]; my $usr = $$row[$usrIndex]; my $date = $$row[$dateIndex]; my $clean = $$row[$cleanIndex]; if ($duplicates{$clean}) { push (@{$duplicates{$clean}}, [$equ, $pmf, $pro, $seri +al, $usr, $date]); } else { %duplicates = ($clean => [$equ, $pmf, $pro, $serial, $usr, + $date]); } }
This is the offending snippet. What this does is create a hash key from the clean serial. The first value is created as an array, which is fine, but when a duplicate comes around I push it on and it adds it to the end of the first array, inside the array. What I want is:
clean1 -> [[equ info] -> [equ info]] clean2 -> [equ info] clean3 -> [[equ info] -> [equ info] -> [equ info]]
so I can then print everything with an outer array length greater than 1 to a file and only get the duplicates. What I am getting is
clean1 -> [equ info, array] clean2 -> [equ info] clean3 -> [equ info, array, array]
I have tried pushing the values into arrays first. I tried using two arrays and pushing the values onto 1 and then push that array onto another. What I do know is that I am making this much harder than it is, but I am stumped.

In reply to hash of arrays of arrays by tnyflmngs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.