pike has asked for the wisdom of the Perl Monks concerning the following question:
I have to add XML markup to a text file. E.g. I must markup dates and numbers. Since dates often contain numbers (as in 4/2/2001), my approach would be to first find all dates, mark them up (or is it markup them?), and then scan the remaining (non-marked up) text for numbers.
Specifically, I was thinking of using XML::DOM::Node to first create a node containing all the text and then add nodes for dates and numbers as I find them. In the code snippet below, I assume that I have functions findDate and findNumber that return the text before the date/number, the date/number itself, and the text after it (or undef if there is no date in the text). So I end up with the following code:
Is there a more elegant way to do this? And is XML::DOM::Node and subclasses the right thing to use? Or what should I do? In reality I have about 20 different tags to add to the text, so proposals should not rely on finding just two entities as shown in the example.#createMarkup creates markup for dates and numbers in the given text, #e.g. 'On Oct. 21, the Dow Jones rose to 10043 points' should become #'<mytxt>On <date>Oct. 21</date>, the Dow Jones rose to <number>10043< +/number> points</mytxt> sub createMarkup { my ($text, $doc) = @_; #create parent node my $node = new XML::DOM::Element ($doc, 'mytxt'); #markup dates my $textNode = $node->addText ($text); markupElement ($textNode, $node, \&findDate, 'date'); #markup numbers foreach my $child ($node->getChildNodes ()) { next unless $child->isTextNode (); my $frag = $child->getNodeValue (); markupElement ($child, $node, \&findNumber, 'number'); } return $node; } sub markupElement { my ($textNode, $parent, $rFindFunc, $elemName) = @_; my $doc = $parent->getOwnerDocument (); die unless $textNode->isTextNode (); my $nextNode = $textNode->getNextSibling (); my $text = $textNode->getValue (); while (my ($before, $elem, $after) = &$rFindFunc ($text)) { $textNode->setValue ($before); my $elemNode = new XML::DOM::Element ($doc, $elemName); $elemNode->setValue ($elem); $parent->insertBefore ($elemNode, $nextNode); $textNode = $doc->createTextNode ($after); $parent->insertBefore ($textNode, $nextNode); $text = $after; } }
pike
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Building an XML File from text
by mirod (Canon) on Nov 20, 2001 at 18:33 UTC | |
|
Re: Building an XML File from text
by cacharbe (Curate) on Nov 20, 2001 at 18:26 UTC | |
|
Re: Building an XML File from text
by atlantageek (Monk) on Nov 20, 2001 at 18:36 UTC | |
by pike (Monk) on Nov 20, 2001 at 20:25 UTC | |
by mirod (Canon) on Nov 20, 2001 at 21:04 UTC |