Generate Hash of hashes by reading a large Input file

pr33 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to build a Complex Data structure (Hash of hashes) by reading a Large file .

I have the Data such that each Zone in the Stack has some Clusters and each Cluster has some hosts showing their status / Resource Capacity etc..

I am trying to bundle the Data in to a Nested Hash . The First Hash keyed by Zones with Values as the Corresponding Clusters in the Zone and the Second Hash will be keyed by Cluster names and the Host staus, CPU/Memory Capacity ..

Below is my input data


List of Zones in this Stack……..

Zone ID                                 ZONE Name
-------------------------------------------------------------
8f-bx-33                                                      SVM-Zone
72-0f-163                                                    K2PHB
11x-223a-44f                                              K2B-Zone1



SVM-Zone

List of HVM Clusters, Hosts Status and its Capacity in this Zone....

Cluster ID        Cluster Name          Cluster Type    Memory OverCom
+mit Ratio  CPU OverCommit Ratio
----------------------------------------------------------------------
+-------------------------------
6500b1              PO01-Cluster1              HVM             3.0    
+                                        4.2
b2732096         PO046-Cluster1           HVM             1.0         
+                                   2.25
9ff0d432          PO26-CLUSTER01       HVM             1.0            
+                                3.25


PO01-Cluster1

Host Name                                    No. of Running VMs       
+                    CS Host Status      CS Resource State
----------------------------------------------------------------------
+--------------------------------------------
cork.example.com                               37                     
+                                      Up                     Enabled
soy.example.com                                31                     
+                                       Up                     Enabled
bot.example.com                                25                     
+                                       Down                Enabled
bunker.example.com                          28                        
+                                    Maintenance    Enabled

Total No. of Hosts in this Cluster:  4
No. of HOSTS Up:  3
No. of HOSTS Down:  1


Listing the current capacity in the cluster

Resource Type                 Total Capacity                Available 
+Capacity            Used Capacity                 Used Percentage
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
CPU:                            3949740 MHz                   592740 M
+Hz                    3357000 MHz                   84.99%
MEMORY:                       10014 GB                      979 GB    
+                    9035 GB                       90.22%
ACTUAL STORAGE:               41279 GB                      24731 GB  
+                    16547 GB                      40.09%
ALLOCATED STORAGE:            81920 GB                      24840 GB  
+                    57079 GB                      69.68%



PO046-Cluster1


Host Name                                    No. of Running VMs       
+                    CS Host Status      CS Resource State
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
fort.example.com                                20                    
+                                       Up                  Enabled
server1.example.com                         20                        
+                                   Up                  Enabled
bolverk.example.com                         25                        
+                                   Up                  Enabled
rand.example.com                              0                       
+                                    Down                Enabled
keystone.example.com                     20                           
+                                 Up                  Enabled

Total No. of Hosts in this Cluster:  5
No. of HOSTS Up:  4
No. of HOSTS Down:  1


Listing the current capacity in the cluster

Resource Type                 Total Capacity                Available 
+Capacity            Used Capacity                 Used Percentage
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
CPU:                          3949200 MHz                   216325 MHz
+                    3732875 MHz                   94.52%
MEMORY:                       10077 GB                      1381 GB   
+                    8696 GB                       86.29%
ACTUAL STORAGE:               40960 GB                      21700 GB  
+                    19259 GB                      47.02%
ALLOCATED STORAGE:            81920 GB                      15361 GB  
+                    66558 GB                      81.25%




PO26-CLUSTER01

Host Name                                    No. of Running VMs       
+                    CS Host Status      CS Resource State
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
cedar.example.com                            19                       
+                                     Up                  Enabled
kentucky.example.com                       21                         
+                                   Down                  Enabled
rose.example.com                             19                       
+                                      Up                  Enabled
melt.example.com                             15                       
+                                      Down                  Enabled
henry.example.com                           23                        
+                                     Up                  Enabled
rant.example.com                             23                       
+                                       Up                  Enabled
rosalind.example.com                      26                          
+                                     Do                  Enabled

Total No. of Hosts in this Cluster:  7
No. of HOSTS Up:  4
No. of HOSTS Down:  3


Listing the current capacity in the cluster

Resource Type                 Total Capacity                Available 
+Capacity            Used Capacity                 Used Percentage
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
CPU:                          3949740 MHz                   637740 MHz
+                    3312000 MHz                   83.85%
MEMORY:                       10077 GB                      977 GB    
+                    9100 GB                       90.3%
ACTUAL STORAGE:               41779 GB                      15963 GB  
+                    25815 GB                      61.79%
ALLOCATED STORAGE:            83558 GB                      9049 GB   
+                    74508 GB                      89.17%






K2PHB

List of HVM Clusters, Hosts Status and its Capacity in this Zone....

Cluster ID                              Cluster Name       Cluster Typ
+e        Memory OCR  CPU OCR
----------------------------------------------------------------------
+-----------------
a95630a82bf             PC1-P01-Cluster1               HVM            
+     1.0                      2
441fd92c-163e          PC1-P02-Cluster1               HVM             
+    1.0                      2.25


PC1-P01-Cluster1

Host Name                                    No. of Running VMs       
+                    CS Host Status      CS Resource State
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
pc-lhv01.example.com                          22                      
+                              Up                  Enabled
pc-lhv02.example.com                         20                       
+                              Up                  Enabled
pc-lhv03.example.com                         25                       
+                              Up                  Enabled

Total No. of Hosts in this Cluster:  3
No. of HOSTS Up:  3
No. of HOSTS Down:  0


Listing the current capacity in the cluster

Resource Type                 Total Capacity                Available 
+Capacity            Used Capacity                 Used Percentage
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
CPU:                          3510400 MHz                   739136 MHz
+                    2771264 MHz                   78.94%
MEMORY:                       10109 GB                      2773 GB   
+                    7336 GB                       72.56%
ACTUAL STORAGE:               41180 GB                      25535 GB  
+                    15645 GB                      37.99%
ALLOCATED STORAGE:            81920 GB                      33547 GB  
+                    48372 GB                      59.05%




PC1-P02-Cluster1

Host Name                                    No. of Running VMs       
+                    CS Host Status      CS Resource State
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
nwk-pci-pod02-lhv08.example.com                1                      
+                      Up                  Enabled
nwk-pci-pod02-lhv11.example.com                20                     
+                      Up                  Enabled

Total No. of Hosts in this Cluster:  2
No. of HOSTS Up:  2
No. of HOSTS Down:  0



Listing the current capacity in the cluster

Resource Type                 Total Capacity                Available 
+Capacity            Used Capacity                 Used Percentage
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
CPU:                          3950280 MHz                   1234155 MH
+z                   2716125 MHz                   68.76%
MEMORY:                       10085 GB                      2937 GB   
+                    7148 GB                       70.87%
ACTUAL STORAGE:               41976 GB                      18896 GB  
+                    23079 GB                      54.98%
ALLOCATED STORAGE:            81920 GB                      24708 GB  
+                    57211 GB                      69.84%



K2B-Zone1

List of KVM Clusters, Hosts Status and its Capacity in this Zone....

Cluster ID                              Cluster Name                  
+               Cluster Type        Memory OverCommit Ratio  CPU OverC
+ommit Ratio
----------------------------------------------------------------------
+---------------------------------------------------------------------
+----------------
08d-b0c9acd8887e                  K2B-PD1-Cluster1                    
+      HVM                 1.0                      4


K2B-PD1-Cluster1

Host Name                                    No. of Running VMs       
+                    CS Host Status      CS Resource State
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
k2b-lhv-01.example.com                     0                          
+                  Up                                      Enabled
k2b-lhv-02.example.com                     4                          
+                  Up                                      Enabled
k2b-lhv-03.example.com                     0                          
+                  Disconnected                     Enabled
k2b-lhv-04.example.com                     0                          
+                  Disconnected                     Enabled

Total No. of Hosts in this Cluster:  4
No. of HOSTS Up:  2
No. of HOSTS Down:  2

No. of HOSTS in Disconnected State:  2


Listing the current capacity in the cluster

Resource Type                 Total Capacity                Available 
+Capacity            Used Capacity                 Used Percentage
----------------------------------------------------------------------
+---------------------------------------------------------------------
+-----------
CPU:                          8073920 MHz                   7584920 MH
+z                   489000 MHz                    6.06%
MEMORY:                       5801 GB                       4632 GB   
+                    1169 GB                       20.16%
ACTUAL STORAGE:               46206 GB                      44527 GB  
+                    1678 GB                       3.63%
ALLOCATED STORAGE:            81920 GB                      24708 GB  
+                    57211 GB                      69.84%
[download]

Below is my Code


#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
#######################

my ($zoneid, $zonename);
my ($clusid, $clusname);
my ($host, $hoststatus);
my ($cpu, $cpuusage);
my ($mem, $memusage);
my ($storage, $storage_used);
my %ClusInfo;
my @zones;
my $clushashref = {};


########################

sub getZoneInfo {
    my $file = shift;
    open my $fh, '<', $file or die "Unable to Open the File $file for 
+reading: $!\n";
    my $on;
    while (my $line = <$fh>) {
        chomp $line;
    if ($line =~ /^Zone/i) {
       $on = 1;
        } elsif ($on) {
          last if $line =~ /^$/;
          $line =~ s/-{2,}//g;
              ($zonename) = (split /\s+/, $line)[1];
          push @zones, $zonename if defined ($zonename);
    }
    }

    close($fh);
#     print Dumper \@zones;
    return  \@zones;

}


sub get_Kvm_Clusters_of_Zones {
    my $file = shift;
    my $ZoneAR = &getZoneInfo($file);
    open my $fh, '<', $file or die "Unable to Open the File $file for 
+reading: $!\n";
    while (my $line = <$fh> ) {
    chomp $line;
    $line =~ s/^\s+|\s+$//g;
    $line =~ s/^\s.*//g;
        next if $line =~ /^$/;
        foreach my $zone (@$ZoneAR) {
        if ($line =~ /(^$zone)/) {
        $zonename= $1;
        } elsif ($line =~ /HVM\s+\w+\.\w+/) {
        ($clusid, $clusname) = (split /\s+/, $line)[0,1];
        $ClusInfo{$zonename}{$clusname} = {};
        } elsif ($line =~ /No HVM Clusters/) {
        $ClusInfo{$zonename}{'No HVM'} = 0;
         }

     }
    }

    close($fh);


#print Dumper \%ClusInfo;

return \%ClusInfo;

}

 sub get_HostInfo_of_Clusters {
    my $file = shift;
    $clushashref = &get_Kvm_Clusters_of_Zones($file);
    open my $fh, '<', $file or die "Unable to Open the File $file for 
+reading: $!\n";
    while (my $line = <$fh>) {
    chomp $line;
    $line =~ s/^\s+|\s+$//g;
    next if $line =~ /^$/;
        foreach my $zone (keys %$clushashref) {
             foreach my $cname (keys %{ $clushashref->{$zone} } ) {
                next if $cname =~ /No HVM/;
            if ($line =~ /^($cname)/) {
            $clusname = $1
            } elsif ($line =~ /example\.com/) {
            ($host, $hoststatus) = (split /\s+/, $line)[0, 2];
            chomp $host;
                $clushashref->{$zone}->{$clusname}->{$host} = $hoststa
+tus;
            chomp $hoststatus;
            } elsif ($line =~ /^CPU/) {
                ($cpu, $cpuusage) = (split /\s+/, $line)[0, -1];
            $cpuusage =~ s/%//g;
            chomp $cpuusage;
            $clushashref->{$zone}->{$clusname}->{$cpu} = $cpuusage if 
+(defined($cpuusage));
            } elsif ($line =~ /^MEMORY/) {
            ($mem, $memusage) = (split /\s+/, $line)[0, -1];
            $memusage =~ s/%//g;
            chomp $memusage;
            $clushashref->{$zone}->{$clusname}->{$mem} = $memusage if 
+(defined($memusage));
           } elsif ($line =~ /^ALLOCATED\s+STORAGE/) {
            ($storage, $storage_used) = (split /\s+/, $line)[1, -1];
            $storage_used =~ s/%//g;
            chomp $storage_used;
            $clushashref->{$zone}->{$clusname}->{$storage} = $storage_
+used if (defined($storage_used));
           }
          }
          }
      }

       close($fh);

print Dumper \%$clushashref;

#    return $clushashref;

}

#&getZoneInfo('hvm.txt');
#&get_Kvm_Clusters_of_Zones('hvm.txt');
&get_HostInfo_of_Clusters('hvm.txt');
[download]

The First 2 Subroutines returns me the right results . The issue is the last subroutine where it repeats all the Cluster names in the Output for each Zone repeatedly instead of printing only the Clusters/Hosts associated with the Zone .

O/P of Subroutine &getZoneInfo('hvm.txt'); as expected .

 $VAR1 =  [
          'SVM-Zone',
          'K2PHB',
          'K2B-Zone1'
          ];

------
[download]

O/P from &get_Kvm_Clusters_of_Zones('hvm.txt');

$VAR1 = {
          'SVM-Zone' => {
                          'PO26-CLUSTER01' => {},
                          'PO01-Cluster1' => {},
                          'PO046-Cluster1' => {}
                        },
          'K2PHB' => {
                       'PC1-P01-Cluster1' => {},
                       'PC1-P02-Cluster1' => {}
                     },
          'K2B-Zone1' => {
                           'K2B-PD1-Cluster1' => {}
                         }
        };

---------
[download]

O/p from 3rd sub routine. I am only providing the O/p here for one Zone . SVM-Zone should have only 3 Clusters , But it returns a Hash containing all the Clusters for each of the Zone .

          'SVM-Zone' => {
                          'PO26-CLUSTER01' => {
                                                'cedar.example.com' =>
+ 'Up',
                                                'kentucky.example.com'
+ => 'Down',
                                                'melt.example.com' => 
+'Down',
                                                'rant.example.com' => 
+'Up',
                                                'rose.example.com' => 
+'Up',
                                                'MEMORY:' => '90.3',
                                                'rosalind.example.com'
+ => 'Do',
                                                'henry.example.com' =>
+ 'Up',
                                                'STORAGE:' => '89.17',
                                                'CPU:' => '83.85'
                                              },
                          'PC1-P02-Cluster1' => {
                                                  'MEMORY:' => '70.87'
+,
                                                  'nwk-pci-pod02-lhv08
+.example.com' => 'Up',
                                                  'nwk-pci-pod02-lhv11
+.example.com' => 'Up',
                                                  'CPU:' => '68.76',
                                                  'STORAGE:' => '69.84
+'
                                                },
                          'PO01-Cluster1' => {
                                               'MEMORY:' => '90.22',
                                               'cork.example.com' => '
+Up',
                                               'bunker.example.com' =>
+ 'Maintenance',
                                               'CPU:' => '84.99',
                                               'soy.example.com' => 'U
+p',
                                               'bot.example.com' => 'D
+own',
                                               'STORAGE:' => '69.68'
                                             },
                          'PC1-P01-Cluster1' => {
                                                  'CPU:' => '78.94',
                                                  'pc-lhv01.example.co
+m' => 'Up',
                                                  'STORAGE:' => '59.05
+',
                                                  'pc-lhv03.example.co
+m' => 'Up',
                                                  'pc-lhv02.example.co
+m' => 'Up',
                                                  'MEMORY:' => '72.56'
                                                },
                          'PO046-Cluster1' => {
                                                'server1.example.com' 
+=> 'Up',
                                                'fort.example.com' => 
+'Up',
                                                'keystone.example.com'
+ => 'Up',
                                                'rand.example.com' => 
+'Down',
                                                'bolverk.example.com' 
+=> 'Up',
                                                'MEMORY:' => '86.29',
                                                'CPU:' => '94.52',
                                                'STORAGE:' => '81.25'
                                              },
                          'K2B-PD1-Cluster1' => {
                                                  'CPU:' => '6.06',
                                                  'k2b-lhv-01.example.
+com' => 'Up',
                                                  'STORAGE:' => '69.84
+',
                                                  'k2b-lhv-02.example.
+com' => 'Up',
                                                  'k2b-lhv-03.example.
+com' => 'Disconnected',
                                                  'MEMORY:' => '20.16'
+,
                                                  'k2b-lhv-04.example.
+com' => 'Disconnected'
                                                }
[download]

I want to store this in Hash of Hash, So I can generate a report such as below for each of the Host with in the Zone.

Zone   =>   Zonename,    Cluster  =>  Cluster_Name,   Host => Hostname
+,    HostStatus  => Up/Down , CPU  => 50, Memory => 50
[download]

Comment on Generate Hash of hashes by reading a large Input file Select or Download Code

Replies are listed 'Best First'.
Re: Generate Hash of hashes by reading a large Input file by haukex (Archbishop) on Apr 12, 2017 at 09:19 UTC
It's good that you are following some of the "best practices" like splitting your code into subroutines, using three-argument opens with proper error messages, and using Data::Dumper. There are still some other points that could be improved: Your code really needs to be indented properly. See perltidy for a tool to help you with this. You should define your variables in the scope where they are needed, putting them all at the top of the file only makes it a little better than globals. For example, `my ($mem, $memusage)` can be moved into the innermost `elsif` where they are used, the same goes for most of the other variables. You use the older `&foo()` calling style for subroutines, nowadays it's recommended to call subroutines without the `&`, as in `my $clushashref = get_Kvm_Clusters_of_Zones($file);` You use chomp a bit too often. Using it once, on the input line immediately after reading it, is enough, all your chomps afterwards will have no effect. Although not the source of your problems, this bit of code jumped out at me: `$line =~ s/^\s+\|\s+$//g; $line =~ s/^\s.//g;`. You're first removing the whitespace from the beginning and end of the line, then the second regex would* delete the entire contents of the line if it begins with whitespace, which at this point it does not. In combination, this means the second regex will never do anything, but on its own, the second regex doesn't make much sense to me. If you want to skip lines that begin with whitespace, I think it's easier to just do `next if $line=~/^\s/;`. Anyway, on to the main issues. First of all, this looks like a software-generated report. Do you really need to parse the textual representation, or can this software also generate reports in a machine-readable format, like maybe JSON or XML? Second, you parse the file in multiple passes, and then, on the final pass, you loop over both the zones and the clusters on every line. For a short input file like this one, the speed impact is probably not noticeable, but as you add zones and clusters, you'll notice a huge performance degradation. This multiple looping is not necessary. Also, if you can be sure that the report is always in this order, a single pass should be all your need. Personally I'd use a "state machine" kind of approach, I talk about this and gave some examples in this thread, also as I was typing this choroba posted an example. Third, the source of your problem in your current code is that in the innermost loop in `get_HostInfo_of_Clusters`, you always write all information into the `$clushashref`, without skipping those clusters that aren't part of the current zone. You need a conditional statement there to only save the information when appropriate. A quick fix would be to modify the innermost `foreach my $cname` loop in `get_HostInfo_of_Clusters` like so: `foreach my $cname ( keys %{ $clushashref->{$zone} } ) { next if $cname =~ /No HVM/; if ( $line =~ /^($cname)/ ) { $clusname = $1; next; } next if !$clusname \|\| !$clushashref->{$zone}->{$clusname}; if ( $line =~ /example\.com/ ) { ...` [download] However, I strongly recommend rewriting the code into a single-pass state machine type approach instead of continuing to work with the current code, as I think you will only run into more problems (performance and maintenance) as you continue working with it.	[reply] [d/l] [select]
Re^2: Generate Hash of hashes by reading a large Input file by pr33 (Scribe) on Apr 12, 2017 at 18:26 UTC
Thank you for your suggestion on improving the code . The Files are being generated from an external API which I don't have access to. This script would just parse the text file and generate a report on things we are interested in . I haven't tried your solution yet . Working on the code choroba have provided which seems much simpler and better .	[reply]
Re: Generate Hash of hashes by reading a large Input file by choroba (Cardinal) on Apr 12, 2017 at 08:59 UTC
I'd create a state machine to parse the file. The state ($section in the below code) tells me what the parser expects to find. I also need to store the current zone and cluster to be able to attach the new information to the correct part of the structure. Read more... (2 kB) ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^2: Generate Hash of hashes by reading a large Input file by pr33 (Scribe) on Apr 20, 2017 at 20:05 UTC
Thank you Choroba . I tried your code and execution time has improved a lot upon parsing multiple files of the same input format. I was more interested in the Total CPU/Memory/Storage allocation than the Overcommit ratio, So added one more section to the code . `.... some lines .... ....... } elsif(/^Resource Type/) { $section = 'resources'; ...... ..... elsif ('resources' eq $section) { if (my ($resource, $usage) = /^(CPU:\|MEMORY:\|ALLOCATED STORAGE:)\s+\S+\s+\S+\s+\S ++\s+\S+\s+\S+\s+\S+\s+(\S+)$/ ) { $usage =~ s/%//g; $zone{$current_zone}{cluster}{$current_cluster}{$resou +rce} = { usage => $usage}; } elsif (/^$/) { $section = 0; } ......` [download]	[reply] [d/l]