NODE_104 NODE_2541 7682 61 7682 7653 32 61
NODE_2541 NODE_2313 61 189 1 30 160 189
NODE_2313 NODE_2855 189 61 160 189 1 30
Col1:Query NODE
col2 Subject NODE
col3:length of Query string
col4:length of Subj string
Cil5:Query node from(maps on to subj node "from" what char)
col6:Query node to(maps on to subj node "to" what char)
col7:Subj node from(maps on to query node "from" what char)
col8:Subj node to(maps on to query node "to" what char)
------- NODE 104
-------- NODE 2541
------- NODE 2313
-------- NODE 2855
RESULTING TO |--------------------------|(merge of all the above nodes)
| [reply] [d/l] [select] |
I'm not at all clear about what you are trying to do: do you want to insert the SUBJECT NODE into the QUERY NODE or vice versa? And if so, why? - it appears it is already done.
Based on your column descriptions, you have everything you need to know in the table and in fact the strings are already "joined". Columns 5-8 tell you where to find the overlaps. For example, in row 1 of your sample positions 7653-7682 of the QUERY NODE correspond to positions 32-61 in the SUBJECT NODE.
But I doubt you are asking about something that is already done for you. What exactly are you trying to do? Insert all of the SUBJECT node in the same location where now only a part of it exists?
Assuming that you are trying to insert the whole string where only a part of it exists, then hashes have nothing to do with this. Instead you need to use substr to get the part before and after the overlapping portion, like this:
use strict;
use warnings;
while(my $line = <DATA>) {
chomp $line;
my ($sQuery, $sSubject, $iLengthQ, $iLengthS, $iEndSinQ, $iBeginSinQ
+)
= split(/\s+/, $line);
#print "<$sQuery> <$sSubject> <$iEndSinQ> <$iBeginSinQ>\n";
my $sBefore = substr($sQuery, 0, $iBeginSinQ);
my $sAfter = substr($sQuery, $iEndSinQ+1);
# pipes added around inserted portion to make insertion point
# a bit clearer in sample output.
print "$sBefore|$sSubject|$sAfter\n";
}
__DATA__
abc123def 123xxxx 9 7 5 3 0 2
1234xxxx YYYxxYZ0 8 8 5 4 3 4
The above only illustrates merging two strings. If you intend to do several QUERY-SUBJECT pairs (as I imagine you do), then each insertion changes the position within the string and offsets will no longer match the positions recorded in your dataset. The easiest way to avoid this problem is to build a graph of QUERY-SUBJECT relations and then traverse it depth first, so that you are guaranteed not to need to insert a string into any string that has already been modified by insertion. Building, and traversing such a graph is less about hashes and more about navigation and recursion (or looks and stacks if you want a non-recursive solution).
Best, beth
Update: added note about expanding the solution to a large number of records. | [reply] [d/l] |