mielstogo has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to sort by the 'cnt' value in the sock_total hash, but can't figure out how to do it when I don't know what the $rip and $pid values are till I'm in the loop.
I managed to do this before on a HOH where I knew the outer keys, but in this case Im working my way through all values and want to show the most active sockets by count first.
I've done sorts before but having trouble figuring out how to do the same with the nested hash this round:
example of a previous sort: foreach $sym (sort {$pid_PFAULT{$pid}{$b} <=> $pid_PFAULT{$pid} +{$a}} (keys %{$pid_PFAULT{$pid}})) (and the above required I be inside a loop knowing the $pid value)
Current problem HOH I need to sort:
# text report by remoteIP:port sub text_sock_report_byrip() { my $pid, my $avg, my $rip, my $type, my $cmd; for $rip (keys %sock_total) { print "Report on remote host: $rip\n"; for $pid (keys %{$sock_total{$rip}}) { printf("pid: %s\n",$pid); for $type (keys %{$sock_total{$rip}{$pid}}) { printf("\t%s, ",$type); printf("\tcnt: %8d, ",$sock_total{$rip}{$pid}{$type}{'cnt'}); $avg = $sock_total{$rip}{$pid}{$type}{'tot'} / $sock_total{$rip}{$pid}{$type}{'cnt'}; printf("\tavg: %12.6f, ",$avg); printf("\tmin: %12.6f, ",$sock_total{$rip}{$pid}{$type}{'min'}); printf("\tmax: %12.6f\n",$sock_total{$rip}{$pid}{$type}{'max'}); } } } }
Sample data: (where VAR1=$rip, VAR2=$pid 'cnt' => 1) ('cnt' is integer from 1>n)
VAR1 = 'R=10.10.0.111:54998'; $VAR2 = { '12913' => { 'write' => { 'tot' => '2.3e-05', 'min' => '0.000023', 'max' => '0.000023', 'cnt' => 1 } } }; $VAR3 = 'R=10.10.1.110:57354'; $VAR4 = { '12913' => { 'read' => { 'tot' => '1.5e-05', ..... etc....

Replies are listed 'Best First'.
Re: Trouble sorting a nested HOH
by graff (Chancellor) on Nov 28, 2007 at 02:35 UTC
    I'm not sure I understand your goal. You seem to be printing a sequence of structured "paragraphs", like:
    Report on remote host: (some.host) pid: (some pid) some socket statistics some more socket statistics ... pid: (some other pid) still more socket statistics had enough socket statistics? ... Report on remote host: (another.host) pid: (yet another pid) ... ...
    Now, are you supposed to be sorting according to the "cnt" value within each "pid" paragraph? Or are you supposed to sort the pid paragraphs within a given "rip" section according to which pid has the highest "cnt" value? Or are you supposed to sort the "Remote host" sections according to which one has the highest "cnt" value?

    Or maybe you don't really want the output to be structured that way? If you want each "rip/pid/socktype" ordered according to its respective "cnt" value (that is, it's okay that various lines for "rip X" are interleaved with lines for "rip Y" because of their "cnt" rankings), then you just come up with a suitable report line format that keeps all the information together on each line, use your loop to "sprintf()" each report line onto an array, then sort the array before you print it.

    Once you clarify what you're trying to do, the answer should come pretty quickly.

    (updated to fix grammar)

      Originally I just sorted the outer "paragraphs", where remote host IP address was sorted in ascending order. Then once my team saw the report they thought it was great except for the fact that they had to scan down the report to see sockets with high cnt values.
      When analyzing performance we weight a higher cnt (more socket activity) greater than one with only cnt=1 since that only happens once.
      So I thought there should be a way I could sort by cnt and not care about the order of RemoteIP, pid but just make sure the correct ones are correlated with the cnt value.
      RemIP, Pid, Type, Cnt, Avg, Min, Max a b c 999 0.1 0.05 0.5 a1 b1 c1 871 0.02 0.02 0.7 a2 b2 c2 25 0.01 0.03 0.234
        Actually I gave a bad example in my original perl code since that was the text mode report. The HTML mode report has the same perl Hash but loads a HTML table similar to my table example above.

        I want to sort by 'cnt' so perhaps I'm better off just loading an array with the row values from the hash and sorting it as I load it. I can sort by RemIP, or PID no problem. I get lost when it comes to 'cnt' since if I sort it at the wrong place im just ordering it within the Type or PID part.
        Whereas what I want is to bubble the rows with high CNT values up to the top of the report.
        $sock_total{$rip}{$pid}{$type}{'cnt'}
Re: Trouble sorting a nested HOH
by chrism01 (Friar) on Nov 28, 2007 at 06:51 UTC
    If I understand corectly, you have effectively got a rec for each IP/Sock, with various bits of info attached to each one.
    In a similar situation, I converted total hash to an array of anonymous hashrefs (1 arr element for each 'rec'), then you can easily sort on any (sub)field eg:

    for $sort_rec ( sort { $a->{'day_num'} <=> $b->{'day_num'} || $a->{'sti'} <=> $b->{'sti'} || $a->{'bnum'} cmp $b->{'bnum'} || $a->{'pos'} <=> $b->{'pos'} } @offset_recs
    HTH
    Cheers
    Chris
      Exactly the "pointer" I needed! Thanks for the input! An array of anonymous hashrefs will work perfectly with my row oriented presentation.

      This will also lend itself to my -graph option which uses GD::Graph to plot the data, once in the array of anon hash refs I can easily clip the outlier averages to make a decent bar graph (and show the outliers on their own graph if need be).

      Thanks!