in reply to Matching up XML tags in 2 arrays

You could build a hash for each doc, where the key is the node's content, and the value is a reference to a list of ids.
my %text1 = ( "is" => [ "1" ], "example that I just made" => [ "2" ], "I" => [ "3" ], ); my %text2 = ( "Here is" => [ "7" ], "example that I just made" => [ "8" ], ); foreach my $content (keys %text2) { next if not exists $text1{$content}; my @ids = ( (map { "text1.$_" } @{$text1{$content}}), (map { "text2.$_" } @{$text2{$content}}), ); print("\"$content\" found at ", join(', ', @ids), "\n"); }

Building the hashes is an exercise left to the user (since it's dependant on the parser you're using).

Replies are listed 'Best First'.
Re^2: Matching up XML tags in 2 arrays
by bwgoudey (Sexton) on Jan 16, 2007 at 02:02 UTC
    Wouldn't that mean that if two tags had the same content, there would be a collision?

    Eg <Tag id=1>He</Tag> Walked Like <Tag id=2>He</Tag> Talked. Then there would be a collision of some kind between the two tags containg 'He'.

    What does perl do in a situation like this? Is the first tag simply replaced by the second?

      That's why ikegami suggested using a hash of arrays. If you push a new tag ID onto the array you will be able to track all occurrences without overwriting existing data. Your example would look something like this:

      %text = ( 'He' => [ 1, 2 ], ...

      If, on the other hand, you were using a straight hash:

      %text = ( 'He' => 1, ...
      then yes, the first tag would be replaced by the second.

      You may find perldsc and our Tutorials section (specifically Data Types and Variables) helpful.