Re: Generate Hash of hashes by reading a large Input file

It's good that you are following some of the "best practices" like splitting your code into subroutines, using three-argument opens with proper error messages, and using Data::Dumper. There are still some other points that could be improved:

Your code really needs to be indented properly. See perltidy for a tool to help you with this.
You should define your variables in the scope where they are needed, putting them all at the top of the file only makes it a little better than globals. For example, my ($mem, $memusage) can be moved into the innermost elsif where they are used, the same goes for most of the other variables.
You use the older &foo() calling style for subroutines, nowadays it's recommended to call subroutines without the &, as in my $clushashref = get_Kvm_Clusters_of_Zones($file);
You use chomp a bit too often. Using it once, on the input line immediately after reading it, is enough, all your chomps afterwards will have no effect.

Although not the source of your problems, this bit of code jumped out at me: $line =~ s/^\s+|\s+$//g; $line =~ s/^\s.*//g;. You're first removing the whitespace from the beginning and end of the line, then the second regex would delete the entire contents of the line if it begins with whitespace, which at this point it does not. In combination, this means the second regex will never do anything, but on its own, the second regex doesn't make much sense to me. If you want to skip lines that begin with whitespace, I think it's easier to just do next if $line=~/^\s/;.

Anyway, on to the main issues. First of all, this looks like a software-generated report. Do you really need to parse the textual representation, or can this software also generate reports in a machine-readable format, like maybe JSON or XML?

Second, you parse the file in multiple passes, and then, on the final pass, you loop over both the zones and the clusters on every line. For a short input file like this one, the speed impact is probably not noticeable, but as you add zones and clusters, you'll notice a huge performance degradation. This multiple looping is not necessary. Also, if you can be sure that the report is always in this order, a single pass should be all your need. Personally I'd use a "state machine" kind of approach, I talk about this and gave some examples in this thread, also as I was typing this choroba posted an example.

Third, the source of your problem in your current code is that in the innermost loop in get_HostInfo_of_Clusters, you always write all information into the $clushashref, without skipping those clusters that aren't part of the current zone. You need a conditional statement there to only save the information when appropriate. A quick fix would be to modify the innermost foreach my $cname loop in get_HostInfo_of_Clusters like so:

foreach my $cname ( keys %{ $clushashref->{$zone} } ) {
    next if $cname =~ /No HVM/;
    if ( $line =~ /^($cname)/ )
        { $clusname = $1; next; }
    next if !$clusname || !$clushashref->{$zone}->{$clusname};
    if ( $line =~ /example\.com/ ) {
        ...
[download]

However, I strongly recommend rewriting the code into a single-pass state machine type approach instead of continuing to work with the current code, as I think you will only run into more problems (performance and maintenance) as you continue working with it.

Comment on Re: Generate Hash of hashes by reading a large Input file Select or Download Code

Replies are listed 'Best First'.
Re^2: Generate Hash of hashes by reading a large Input file by pr33 (Scribe) on Apr 12, 2017 at 18:26 UTC
Thank you for your suggestion on improving the code . The Files are being generated from an external API which I don't have access to. This script would just parse the text file and generate a report on things we are interested in . I haven't tried your solution yet . Working on the code choroba have provided which seems much simpler and better .	[reply]