Re: Matching up XML tags in 2 arrays

You could build a hash for each doc, where the key is the node's content, and the value is a reference to a list of ids.

my %text1 = (
   "is"                       => [ "1" ],
   "example that I just made" => [ "2" ],
   "I"                        => [ "3" ],
);

my %text2 = (
   "Here is"                  => [ "7" ],
   "example that I just made" => [ "8" ],
);

foreach my $content (keys %text2) {
   next if not exists $text1{$content};

   my @ids = (
      (map { "text1.$_" } @{$text1{$content}}),
      (map { "text2.$_" } @{$text2{$content}}),
   );

   print("\"$content\" found at ", join(', ', @ids), "\n");
}
[download]

Building the hashes is an exercise left to the user (since it's dependant on the parser you're using).

Comment on Re: Matching up XML tags in 2 arrays Download Code

Replies are listed 'Best First'.
Re^2: Matching up XML tags in 2 arrays by bwgoudey (Sexton) on Jan 16, 2007 at 02:02 UTC
Wouldn't that mean that if two tags had the same content, there would be a collision? Eg `<Tag id=1>He</Tag> Walked Like <Tag id=2>He</Tag> Talked.` Then there would be a collision of some kind between the two tags containg 'He'. What does perl do in a situation like this? Is the first tag simply replaced by the second?	[reply] [d/l]
Re^3: Matching up XML tags in 2 arrays by bobf (Monsignor) on Jan 16, 2007 at 03:16 UTC
That's why ikegami suggested using a hash of arrays. If you push a new tag ID onto the array you will be able to track all occurrences without overwriting existing data. Your example would look something like this: `%text = ( 'He' => [ 1, 2 ], ...` [download] If, on the other hand, you were using a straight hash: `%text = ( 'He' => 1, ...` [download] then yes, the first tag would be replaced by the second. You may find perldsc and our Tutorials section (specifically Data Types and Variables) helpful.	[reply] [d/l] [select]