gnu@perl has asked for the wisdom of the Perl Monks concerning the following question:

I have a Hash of Hashes created from a configuration file. In my program I first iterate over the hash to check for grammatical correctness as the rules in my program set forth. Upon finding incorrect rules in the has of configuration data I flag the hash entry for deletion. After this loop I delete the invalid entries from the hash.

Next I iterate over the hash again, this time checking that the rules that matched in grammer are correct in content. Things like files named in the config file exist, things named as directories are indeed directories etc. While doing this I again flag invalid entries to be deleted from the hash. When exiting this loop I delete the flagged entries from the hash.

Now I have a hash that contains rules checked for grammer and content. I can now iterate over the hash AGAIN and process the remaining rules.

Ok, here's the deal. Because I am in a loop in the lower level of the HoH I do not want to delete the hash entry until I am out of the processing loop, that's why I flag the entry then delete it afterward. What I would like opinions on is this:

Should I keep each of these functions as seperate functions and keep iterating over the hash each time I need to access the data or should I combine this into a single larger function and operate as follows?

Iterate over the hash and check grammer on each entry, if the grammer is correct, check for valid content in that particular entry. Once each 'rule' has been checked for grammer and validity process that rule ( I have defined actions for each rule type ). After processing the single rule, check the next rule for grammer and validity then process it, etc?

K, fire away, I can take it.

Replies are listed 'Best First'.
Re: Request opinions and ideas
by Tanalis (Curate) on Sep 25, 2002 at 13:59 UTC

    My personal opinion is that the validity (grammar/content) checking and processing should be kept separate, in two different loops.

    Doing it this way means that you can be 100% certain once the first loop has completed that you have clean, valid data, and you're safe to begin processing.

    From a testing perspective, it then becomes very straightforward to check the data for validity by sight at the end of the first loop if necessary.

    Just my two cents .. ignore me if you like :)
    --Foxcub

Re: Request opinions and ideas
by BrowserUk (Patriarch) on Sep 25, 2002 at 14:04 UTC

    As the only way (that I can think of), of "iterating over a %hash" is for my $key (keys %hash) { ... }, or one the essentially similar variations, the rule about not modifying your iterator whilst in a loop doesn't really apply in as much as, the iterator is iterating a list generated by the keys %hash code. So, if you choose to delete an element of the hash using delete $hash{$key}; whilst in the loop, you haven't modified the list in anyway, so I don't see any reason why you should not do this in one pass rather than two.

    Without seeing the underlying logic and code, it's not possible to be certain that its safe to run your grammer and contents validations in the same pass, but I can't see any reasons why you shouldn't.

    The only thing you really need to ensure is that you don't delete a complete lower level hash until all passes have been there and looked. This would be considerably easier in one pass than in 4 I think.

    You might need to add some defensive if exists .... and/or if defined .... checks in there somewhere if you don't already have them.

    I think I would definately try for the 1 pass method.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
Re: Request opinions and ideas
by fglock (Vicar) on Sep 25, 2002 at 14:12 UTC

    I'd only rewrite it for learning purposes.

    Anyway, take a look at Meta::Utils::Hash:

    DESCRIPTION This is a general utility perl library for all kinds of hash routines. + This mainly iterates hashes using the each builtin which is the fast +est to do the job and like the list library uses refernces to avoid d +uplication whereever possible. FUNCTIONS ... filter_multi($$$$) This one is a full filter. This gets: 0. a hash. 1. whether to do +filtering or not. 2. boolean indicating whether filter is negative or + positive. 3. list of modules for filtering data. And does the entire + filtering process in an efficient manner. filter_file_sing_regexp($$$) This routine receives a hash and a regular expression and returns +a hash containing only the elements in the hash which are pointers to + files which contain a the regular expression. This also receives as +the third variable whether to print the matched lines or not (this is + passes along to Meta::Utils::File::File::check_sing_regexp). ...
Re: Request opinions and ideas
by charnos (Friar) on Sep 25, 2002 at 13:59 UTC
    It sounds like you're not just thinking up pseudo-code at this stage, SoPW posts generally have some snippets of code in them to help other monks visualize your current situation and goalJust as a comment, while this post is very detailed in its description of the problem, multidimensional data. structures are somewhat hard to visualize while reading a paragraph. :P
    Also, you may describe the execution of your code somewhat differently than another monk looking at the same code may, and since it is your code you're more likely to skip over obvious steps (not that you did here).

    P.S. I vote separation of functions, generally (and perhaps due to my upbringing in OO), I like to farm out operations whenever possible, to *me* (and that's just my opinion, you could be different) it makes things easier to debug and maintain.
As requested,
by gnu@perl (Pilgrim) on Sep 25, 2002 at 14:20 UTC
    Here is the dumper of the hash I am testing followed by the code. Please bear in mind I just copied this from my gvim session, it is currently 'in progress' and contains, well lets just call them "yet to be optimized" areas.
    $VAR1 = { '0' => { 'file' => 'messages', 'action' => 'remove', 'age' => '20', 'directory' => '/var/ad*' }, '1' => { 'file' => 'messages.*', 'action' => 'compress', 'age' => '20', 'directory' => '/var/adm' } };

    --------------------------------------------------