Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: How to Truncate Corrupt Document.xml Files?

by socrtwo (Sexton)
on Feb 16, 2012 at 02:11 UTC ( [id://954122]=note: print w/replies, xml ) Need Help??


in reply to Re: How to Truncate Corrupt Document.xml Files?
in thread How to Truncate Corrupt Document.xml Files?

I haven't tried that yet. Thanks for heads up. I'm looking at streaming SAX parsing now. I see the Ruby Gem Nokogiri may be well suited for this but there are a lot of SAX modules in Perl and I don't know anything about Ruby at the moment, but I know a little of Perl.

Replies are listed 'Best First'.
Re^3: How to Truncate Corrupt Document.xml Files?
by educated_foo (Vicar) on Feb 16, 2012 at 02:28 UTC
    I don't parse much XML (thank God), but XML::Parser (originally written by Larry Wall) has always been pretty straightforward to use -- just define Start() and End() handlers for a start.

      I read that the SAX parser is not so good for rebuilding the XML document which is what I want to do, unless I use 2 parsing instances, one as a SAX parser to analyze the document.xml file and the other with XML::Parser to actually add the intended end tags and rebuild the document.xml.

      However is there any real benefit to this use of SAX? Can't I just define say a start handler with XML::Parser that adds non self ending tags to an array and then define an end handler that removes tags from the same array. Then maybe at the end of parsing all that would be left in the array would be those tags not found by the end handler and these tags could be added to the end of the xml file in reverse order with last in first out?

        Can't I just define say a start handler with XML::Parser that adds non self ending tags to an array and then define an end handler that removes tags from the same array. Then maybe at the end of parsing all that would be left in the array would be those tags not found by the end handler and these tags could be added to the end of the xml file in reverse order with last in first out?
        That's basically what I was trying to suggest. SAX is one common stream-based parser that people coming from a non-Perl backgrounds might know. XML::Parser is another stream-based parser which is, IMHO, easier to use.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://954122]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2024-03-29 12:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found