in reply to Will hash of hashes work?

It's my first post here

Nonsense! *runs*

Seriously, is the example you posted part of the tab delimited table you mention?


"Half of all adults in the United States say they have registered as an organ donor, although only some have purchased a motorcycle to show that they're really serious about it."

Replies are listed 'Best First'.
Re^2: Will hash of hashes work?
by Anonymous Monk on Mar 31, 2009 at 14:10 UTC
    Alright folks,

    Here is a small chunk from my table:

    NODE_104 NODE_2541 7682 61 7682 7653 32 61 NODE_2541 NODE_2313 61 189 1 30 160 189 NODE_2313 NODE_2855 189 61 160 189 1 30

    Col1:Query NODE

    col2 Subject NODE

    col3:length of Query string

    col4:length of Subj string

    Cil5:Query node from(maps on to subj node "from" what char)

    col6:Query node to(maps on to subj node "to" what char)

    col7:Subj node from(maps on to query node "from" what char)

    col8:Subj node to(maps on to query node "to" what char)

    ------- NODE 104 -------- NODE 2541 ------- NODE 2313 -------- NODE 2855

    RESULTING TO |--------------------------|(merge of all the above nodes)

      I'm not at all clear about what you are trying to do: do you want to insert the SUBJECT NODE into the QUERY NODE or vice versa? And if so, why? - it appears it is already done.

      Based on your column descriptions, you have everything you need to know in the table and in fact the strings are already "joined". Columns 5-8 tell you where to find the overlaps. For example, in row 1 of your sample positions 7653-7682 of the QUERY NODE correspond to positions 32-61 in the SUBJECT NODE.

      But I doubt you are asking about something that is already done for you. What exactly are you trying to do? Insert all of the SUBJECT node in the same location where now only a part of it exists?

      Assuming that you are trying to insert the whole string where only a part of it exists, then hashes have nothing to do with this. Instead you need to use substr to get the part before and after the overlapping portion, like this:

      use strict; use warnings; while(my $line = <DATA>) { chomp $line; my ($sQuery, $sSubject, $iLengthQ, $iLengthS, $iEndSinQ, $iBeginSinQ +) = split(/\s+/, $line); #print "<$sQuery> <$sSubject> <$iEndSinQ> <$iBeginSinQ>\n"; my $sBefore = substr($sQuery, 0, $iBeginSinQ); my $sAfter = substr($sQuery, $iEndSinQ+1); # pipes added around inserted portion to make insertion point # a bit clearer in sample output. print "$sBefore|$sSubject|$sAfter\n"; } __DATA__ abc123def 123xxxx 9 7 5 3 0 2 1234xxxx YYYxxYZ0 8 8 5 4 3 4

      The above only illustrates merging two strings. If you intend to do several QUERY-SUBJECT pairs (as I imagine you do), then each insertion changes the position within the string and offsets will no longer match the positions recorded in your dataset. The easiest way to avoid this problem is to build a graph of QUERY-SUBJECT relations and then traverse it depth first, so that you are guaranteed not to need to insert a string into any string that has already been modified by insertion. Building, and traversing such a graph is less about hashes and more about navigation and recursion (or looks and stacks if you want a non-recursive solution).

      Best, beth

      Update: added note about expanding the solution to a large number of records.