in reply to How can I extract text from XML document and after that put the extracted text to original place?

You might want to look at the handy dandy XML::Simple module. Look at the XMLin() and XMLout() methods. XMLin() allows you to read in an XML document. Loop through the data structure that is returned from XMLin(), run your spell check on the data within in, then write the final results back to the XML document via XMLout().

As for exporting the tags and the text between the tags to two separate files and then putting them back together, just say 'NO'. On a large XML file, this would be extremely slow and you'd be doing much more work than necessary.


      C:\>shutdown -s
      >> Could not shut down computer:
      >> Microsoft is logged in remotely.
    

  • Comment on XML::Simple for looping through an XML structure

Replies are listed 'Best First'.
Re: XML::Simple for looping through an XML structure
by mirod (Canon) on Jan 29, 2003 at 14:56 UTC

    XML::Simple would probably not work here as it is designed for data-oriented XML and would not properly handle XML documents that include <p>some <i>mixed content</i> like this</p>.

    As for this method being a problem for very large files, in that case the bottleneck would not be the processing time but more likely the time spent using the spell checker interractively. If that's really a problem (a huge file with very few spelling mistakes) you can always do it chunk by chunk using... say... XML::Twig ;--)