Greetings and good evening. I posted a question on a concept script I'm working on to replace a KSH script on Friday morning, which lead to my code being totally revamped. Now I have an issue with my new code I am hoping someone will be able to help me out with. Here goes...

First, a Cliff's Notes explanation of what I'm trying to do. I have an input file that can be up to 5 million lines long, with each line I am interested in (and the bulk of the file) formatted thusly:
{9999991234ff00aa},9999991234,1,"Y",0,0,{55760FFC56837F3E}
I need to grab field 2 (min), substring it to first 6 characters (NPANXX), and count the MINs that start with the NPANXX and output to 2 files that are both patterned as "NPANXX<space>MIN count" - one line per NPANXX record. File 1 is sorted by MIN, file 2 is sorted by count. Here's my code thus far:

#!/usr/bin/perl my %counts = (); my %list = (); my $keys; my $total; my $in = "BIGFILE.out.gz"; my $out_min = "npanxx_minsort.out"; my $out_cnt = "npanxx_cntsort.out"; my $debug_out = "Test.debug"; open IN, "/bin/gunzip -c $in |" or die "IN: $!\n"; open OUT_MIN, ">", "$out_min" or die "OUT_MIN: $!\n"; open OUT_CNT, ">", "$out_cnt" or die "OUT_CNT: $!\n"; open DEBUG, ">", "$debug_out" or die "DEBUG: $!\n"; print "Processing $in...\n"; $total = 0; while (<IN>) { chomp($_); next unless m/^{.*$/; my $min = (split ',')[1]; my $npanxx = substr($min, 0, 6); push (@{$list{$npanxx}}, $min); #LogToFile("Test.out", "$_ -> $min & $npanxx\n"); $counts{$npanxx} += 1; $total++; } print "$_ $counts{$_}\n" for (sort keys %counts); print "Total: $total\n"; print "Generating Flat Files...\n"; foreach $key (sort keys %counts) { print OUT_MIN "$key $counts{$key}\n"; } `echo "Total: $total" >> npanxx_minsort.out`; foreach $key (sort { $counts{$a} <=> $counts{$b} } keys %counts) { printf OUT_CNT "%-7s %s\n", $key, $counts{$key}; } `echo "Total: $total" >> npanxx_cntsort.out`; sub LogToFile { my ($file, $msg) = @_; my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = loc +altime(time); $year += 1900; $mon += 1; if ($sec <= 9) { $sec = "0" . $sec; } if ($min <= 9) { $min = "0" . $min; } if ($hour <= 9) { $hour = "0" . $hour; } if ($mon <= 9) { $mon = "0" . $mon; } if ($mday <= 9) { $mday = "0" . $mday; } my $stamp = $year . "/" . $mon . "/" . $mday . " " . $hour . " +:" . $min . ":" . $sec; my $str = "[" . $stamp . "] " . $msg; open FILE, ">>", $file; print FILE $str; close FILE; }

Everything works awesome - it does EXACTLY what it's supposed to do... with one minor flaw. Turns out the KSH script this will replace is de-duping the MIN that's being captured prior to counting it. I have struggled with implementing something to perform the de-duping to solve this issue since yesterday and can't see the answer in my head to fix this. I think it's going to probably go as an "unless" next to "$counts{$npanxx} += 1" but I have tried about every permutation of code I can think of there and nothing works. I think it is my lack of understanding on that AoH situation.

I would be very grateful to anyone who can get me to the right place on this one. Once this issue is resolved, I can move onto something else... LOL!!!!

Thanks all, as always!!


In reply to Find If Value Exists In Array Of Hashes by ImJustAFriend

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.