I'm not at all clear about what you are trying to do: do you want to insert the SUBJECT NODE into the QUERY NODE or vice versa? And if so, why? - it appears it is already done.
Based on your column descriptions, you have everything you need to know in the table and in fact the strings are already "joined". Columns 5-8 tell you where to find the overlaps. For example, in row 1 of your sample positions 7653-7682 of the QUERY NODE correspond to positions 32-61 in the SUBJECT NODE.
But I doubt you are asking about something that is already done for you. What exactly are you trying to do? Insert all of the SUBJECT node in the same location where now only a part of it exists?
Assuming that you are trying to insert the whole string where only a part of it exists, then hashes have nothing to do with this. Instead you need to use substr to get the part before and after the overlapping portion, like this:
use strict;
use warnings;
while(my $line = <DATA>) {
chomp $line;
my ($sQuery, $sSubject, $iLengthQ, $iLengthS, $iEndSinQ, $iBeginSinQ
+)
= split(/\s+/, $line);
#print "<$sQuery> <$sSubject> <$iEndSinQ> <$iBeginSinQ>\n";
my $sBefore = substr($sQuery, 0, $iBeginSinQ);
my $sAfter = substr($sQuery, $iEndSinQ+1);
# pipes added around inserted portion to make insertion point
# a bit clearer in sample output.
print "$sBefore|$sSubject|$sAfter\n";
}
__DATA__
abc123def 123xxxx 9 7 5 3 0 2
1234xxxx YYYxxYZ0 8 8 5 4 3 4
The above only illustrates merging two strings. If you intend to do several QUERY-SUBJECT pairs (as I imagine you do), then each insertion changes the position within the string and offsets will no longer match the positions recorded in your dataset. The easiest way to avoid this problem is to build a graph of QUERY-SUBJECT relations and then traverse it depth first, so that you are guaranteed not to need to insert a string into any string that has already been modified by insertion. Building, and traversing such a graph is less about hashes and more about navigation and recursion (or looks and stacks if you want a non-recursive solution).
Best, beth
Update: added note about expanding the solution to a large number of records. |