in reply to Generate Hash of hashes by reading a large Input file
It's good that you are following some of the "best practices" like splitting your code into subroutines, using three-argument opens with proper error messages, and using Data::Dumper. There are still some other points that could be improved:
Although not the source of your problems, this bit of code jumped out at me: $line =~ s/^\s+|\s+$//g; $line =~ s/^\s.*//g;. You're first removing the whitespace from the beginning and end of the line, then the second regex would delete the entire contents of the line if it begins with whitespace, which at this point it does not. In combination, this means the second regex will never do anything, but on its own, the second regex doesn't make much sense to me. If you want to skip lines that begin with whitespace, I think it's easier to just do next if $line=~/^\s/;.
Anyway, on to the main issues. First of all, this looks like a software-generated report. Do you really need to parse the textual representation, or can this software also generate reports in a machine-readable format, like maybe JSON or XML?
Second, you parse the file in multiple passes, and then, on the final pass, you loop over both the zones and the clusters on every line. For a short input file like this one, the speed impact is probably not noticeable, but as you add zones and clusters, you'll notice a huge performance degradation. This multiple looping is not necessary. Also, if you can be sure that the report is always in this order, a single pass should be all your need. Personally I'd use a "state machine" kind of approach, I talk about this and gave some examples in this thread, also as I was typing this choroba posted an example.
Third, the source of your problem in your current code is that in the innermost loop in get_HostInfo_of_Clusters, you always write all information into the $clushashref, without skipping those clusters that aren't part of the current zone. You need a conditional statement there to only save the information when appropriate. A quick fix would be to modify the innermost foreach my $cname loop in get_HostInfo_of_Clusters like so:
foreach my $cname ( keys %{ $clushashref->{$zone} } ) { next if $cname =~ /No HVM/; if ( $line =~ /^($cname)/ ) { $clusname = $1; next; } next if !$clusname || !$clushashref->{$zone}->{$clusname}; if ( $line =~ /example\.com/ ) { ...
However, I strongly recommend rewriting the code into a single-pass state machine type approach instead of continuing to work with the current code, as I think you will only run into more problems (performance and maintenance) as you continue working with it.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Generate Hash of hashes by reading a large Input file
by pr33 (Scribe) on Apr 12, 2017 at 18:26 UTC |