I am seeking the wisdom of the Monks.
I am trying to parse an HTML page for specific words and link them to their definition in a glossary.
That is easy enough to do using this simple bit of code:
foreach my $word (keys %glossary) { print STDERR "$word "; # Works but tries to link inside other links. $file =~ s/(\s+)($word)([\.\?\!]?\s+)/"$1<a href=\"".$glossary{$wo +rd}{'link'}."\">$2<\/a>$3"/eig; }
($file contains the entire text of the page and I forgot a \s or two to deal with whitespace. Oops.)
The problem I'm having is that it is trying to link words that are already part of other links. I don't want that to happen. How can I replace just the words that are not enclosed in <a> tags?
I have an idea that involves replacing the links first with a place holder, then replacing the words in the doc, and finally putting the links back but that means going through the file two more times. I would prefer to do it all at once if I could.
I would be greatful for any enlightenment I could find.
PerlStalker
In reply to Linking words in html to glossary. by PerlStalker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |