It's good that you are following some of the "best practices" like splitting your code into subroutines, using three-argument opens with proper error messages, and using Data::Dumper. There are still some other points that could be improved:

Although not the source of your problems, this bit of code jumped out at me: $line =~ s/^\s+|\s+$//g; $line =~ s/^\s.*//g;. You're first removing the whitespace from the beginning and end of the line, then the second regex would delete the entire contents of the line if it begins with whitespace, which at this point it does not. In combination, this means the second regex will never do anything, but on its own, the second regex doesn't make much sense to me. If you want to skip lines that begin with whitespace, I think it's easier to just do next if $line=~/^\s/;.

Anyway, on to the main issues. First of all, this looks like a software-generated report. Do you really need to parse the textual representation, or can this software also generate reports in a machine-readable format, like maybe JSON or XML?

Second, you parse the file in multiple passes, and then, on the final pass, you loop over both the zones and the clusters on every line. For a short input file like this one, the speed impact is probably not noticeable, but as you add zones and clusters, you'll notice a huge performance degradation. This multiple looping is not necessary. Also, if you can be sure that the report is always in this order, a single pass should be all your need. Personally I'd use a "state machine" kind of approach, I talk about this and gave some examples in this thread, also as I was typing this choroba posted an example.

Third, the source of your problem in your current code is that in the innermost loop in get_HostInfo_of_Clusters, you always write all information into the $clushashref, without skipping those clusters that aren't part of the current zone. You need a conditional statement there to only save the information when appropriate. A quick fix would be to modify the innermost foreach my $cname loop in get_HostInfo_of_Clusters like so:

foreach my $cname ( keys %{ $clushashref->{$zone} } ) { next if $cname =~ /No HVM/; if ( $line =~ /^($cname)/ ) { $clusname = $1; next; } next if !$clusname || !$clushashref->{$zone}->{$clusname}; if ( $line =~ /example\.com/ ) { ...

However, I strongly recommend rewriting the code into a single-pass state machine type approach instead of continuing to work with the current code, as I think you will only run into more problems (performance and maintenance) as you continue working with it.


In reply to Re: Generate Hash of hashes by reading a large Input file by haukex
in thread Generate Hash of hashes by reading a large Input file by pr33

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.