snape has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I am interested to get all the keys in the hash of hashes within a particular range. for example:

if I read the first file into hash of hashes as

$hash{$col1}{"$col2_"."$col3"} = "col4\tcol5"
where col stands for the column of the file, then I am interested to search all entries in the hash of hashes which are between range of two values for col2 and col3. col2 and col3 are numeric and col1 is a alpha numeric. I trying to check the entry of the col1 i.e. by
if (exists $hash{$col1}){ if (){ ## Dont know what I should fill here to check the values of col2 a +nd col3 in a particular range } }
Any help in this regard will be highly appreciated. I tried looking some of the examples but I didn't get any.

Replies are listed 'Best First'.
Re: Search in Hash of Hash for the range of values
by ikegami (Patriarch) on Sep 01, 2010 at 21:42 UTC
    Not, if(), for(). You need to grab each key, split it up, and check if each number is in the appropriate range.
Re: Search in Hash of Hash for the range of values
by perlpie (Beadle) on Sep 02, 2010 at 00:01 UTC

    First off, I think you may have a typo. You're referencing a $col2_ variable (note the trailing underscore) that I don't think you meant to. I'm also guessing you wanted dollar signs in the right hand side too. Did you mean something like this?

    $hash{$col1}{$col2 . '_' . $col3} = "$col4\t$col5"

    Second, I don't know the context here, but it sounds like this sort of comparison and filtering would have been easier to do back before things had been put in the HoH.

    On the assumption that this isn't an option, how about the following?

    for my $c1 (keys %hash) { for my $c2_c3 (keys %{$hash{$c1}}) { my ($c2, $c3) = split('_', $c2_c3); if ( () # $c2 is ok && () # $c3 is ok ) { # do stuff for good data } else { # do stuff for bad data } } }

    You'll have to fill in your range tests and actions to take for good/bad data.

      Thanks a lot. It worked for me.

      following on from that..but going out on a limb: i think you're trying to create/process some kind of a bookmark in text, which has starting and ending positions in text.
      what you're doing with combining two values into one hash key is probably not a good idea..as it's just made your job a whole lot harder. a HoHoH works better than Ho{H.H}
      also, if my assumption about the problem is in the ballpark, it's probably better to store as starting position and ending position offset (length); rather than starting position and ending position. you can always quickly calculate the ending position if absolutely necessary at a point in time, but easier to handle the thing (i think) with a start pos + length. your range tests then become rather simple.
      the hardest line to type correctly is: stty erase ^H
Re: Search in Hash of Hash for the range of values
by graff (Chancellor) on Sep 02, 2010 at 00:27 UTC
    Let me see if I understand you correctly: after you've loaded values into your HoH, you're going to be looking at some new set of inputs "col1, col2, col3, ..." values, and you want to check for matches in the HoH.

    Now, based on the HoH being keyed as "$hash{ $c1 }{ "$c2 _ $c3" } (where $c2 is always less-than-or-equal-to $c3), some new set of column values could give any of the following outcomes:

    - (a) $hash{$col1} doesn't exist, or - $hash{$col1} exists, and... - (b) "$col2 _ $col3" gives an exact match to an existing key, or - (c) both $col2 and $col3 fall within the range of an existing key, + or - (d) only $col2 or $col3 (not both) fall within the range of an exi +sting key, or - (e) $col2 and $col3 define a range between existing keys, or - (f) $col2 and $col3 define a range that encompasses one or more ex +isting keys, with extra margins at one or both edges.
    The particular solution you want will depend on what is supposed to happen for cases (d - f). It might also depend on whether or not the initial loading of the HoH is supposed to yield non-overlapping ranges. (If the "$c2-$c3" ranges for a given $c1 are not supposed to overlap, you might need to add sanity checks the input data, and/or conditions on the HoH loading, to make sure you satisfy that constraint.)

    But all-in-all, I think it might be better to treat this as a database problem rather than a hash problem, because SQL already gives you the idioms you need to look for matches according to your criteria. You create a table and load it from your initial input, then for subsequent rows of data, you do a query like:

    select * from table where col1 = ? and ( col2 >= ? and col3 <= ? )
    (using placeholders for the column values -- see DBI for details)

    Depending on what needs to be done about cases of partial or total lack of overlap (d - f above), you can add conditions to that query, and/or use additional queries.

    (There might be ways to emulate that sort of SQL facility with hash keys, but it won't be as simple as SQL, I think.)

Re: Search in Hash of Hash for the range of values
by marinersk (Priest) on Sep 02, 2010 at 03:51 UTC
    Dear snape,

    I suspect your problem is that you are trying to find an easy-as-Perl way to do something that has no Perl shortcuts.

    It comes up from time to time here; someone goes through the gyrations of learning to use HoH, which gives them the abililty to find data in complex structures with extremely lightweight code -- precisely what Hashes are designed to do.

    Then they (or the people who inherit their code) suddenly discover that, lo and behold, sometimes you have to search the keys, too.

    And they show up at the Monastery Gates looking for that magic shortcut way to get it done, since Perl "always" has those, only to discover that this is NOT what hashes were designed to do.

    So, unfortunately, you're just going to have to search through the keys the old fashioned way. Anathema to Perl, perhaps, but, yes, you'll have to write five lines of Perl instead of one. :-)

    By the way, there are technically at least two "old fashioned" ways to do this, not including going to external tools like SQL as admirably noted above.

    One is as shown above, the looped and/or recursive search through the key tree.

    The other is to rewrite the data storage portion of the script; at the same time it writes data to the hash, it then immediately also writes the key(s) to a reverse-lookup list (hash or array, depending on the need). Then you can search for your keys in the reverse-lookup list, often using those lovely Perl shortcuts we all get so accustomed to having at our fingertips.

    Either way, sorry to say, this is not a new problem, and there is no silver bullet. You have to actually slog through the keys or change the way the data is stored in the first place.

      Wow, it's been awhile since I've been downvoted.

      Not that my feelings are hurt, but wondering what in my post elicited the reaction. LOL. Can't improve if I don't know what's wrong.

      Ah, well, such is the nature of a drive-by shooting. :: giggle ::

        Yeah, I've been seeing that happening today in other threads. To me and also to other folks. I guess somebody just is having a bad day for some unknown reason. I don't see any problem with your post and I retaliated with an up vote! You are right, there isn't a tricky way to do everything and sometimes even there is, sometimes it would be so obscure looking that its probably not a good idea to do it!
Re: Search in Hash of Hash for the range of values
by Marshall (Canon) on Sep 02, 2010 at 03:11 UTC
    It appears to me that perlpie's post gives a pretty good example of how to do what the OP asked for. I think it would be helpful if the OP filled some application details. It is certainly possible that the wrong type of data structure has been selected. What would be better is very application specific, so more detail than just there are 5 columns in the file is needed.