anadem has asked for the wisdom of the Perl Monks concerning the following question:

I'm using this code to delete nodes from a file
my $parser = XML::LibXML->new(); $dom = $parser->parse_file( "OLDFILE" ); for my $matrix ( $dom->findnodes( q{ //version_matrix/vm[@type='br'] + } )) { my $version = $matrix->findvalue('./release/@version'); if( $version gt $brver ) { print "discarding version_matrix for $version\n"; my $version_matrix_node = $matrix->parentNode; $version_matrix_node->unbindNode; } } my $changed = $dom->toString; my $fh; open( $fh, '>', "NEWFILE" ); print { $fh } $changed;
where the input "OLDFILE" data is like this fragment:
<supported_version_matrix> <version_matrix> <app> <release version="10.1.e"> <fixed>125.2004</fixed> </release> </app> <vm type="br"> <release version="7.3.0"> </release> </vm> </version_matrix> <version_matrix> <app> <release version="10.1.e"> <fixed>125.2004</fixed> </release> </app> <vm type="br"> <release version="7.2.2"> </release> </vm> </version_matrix> . . . </supported_version_matrix>
The output file "NEWFILE" has gaps of a couple of lines in place of the discarded nodes, like this (where the version_matrix for br 7.3.0 was unbound):
<supported_version_matrix> <version_matrix> <app> <release version="10.1.e"> <fixed>125.2002</fixed> </release> </app>
The blank lines in the gaps each contain a couple of tab characters only. I think (but not 100% sure) that from an xml perspective this doesn't change the structure, but it looks bad. Am I doing something wrong, like maybe the "$matrix" child node has to be freed somehow before its parent can be unbound? Or is there an easy way to avoid it?

thanks for any illumination

Replies are listed 'Best First'.
Re: XML::LibXML unbindNode leaves blank lines in the xml file
by tobyink (Canon) on May 16, 2014 at 10:20 UTC

    From an XML perspective, whitespace is data. The following two XML files are not considered to be equivalent:

    <foo> <bar /> </foo>

    versus

    <foo><bar /></foo>

    Of course XML is really just a base that other markup formats can layer themselves on top of (RSS, Atom, XHTML, RDF/XML, Docbook, SVG, MathML, etc). Some of those formats may consider whitespace to be insignificant, or significant only in certain places. (For example, in XHTML, whitespace is very significant inside <pre> elements; and the whitespace rules elsewhere are... shall we say "complex".) A general purpose XML processor must consider whitespace to be significant because it has no idea whether the particular flavour of XML you're using considers it important.

    So in the examples above, if you remove the <bar> element, a generic XML processor cannot simply remove the whitespace in the first example. It doesn't know whether that whitespace is significant.

    And that's why I wrote XML::LibXML::PrettyPrint. It allows you to pass in whitespace-massaging rules to the constructor, and thus reformat XML according to your own set of rules. In your case, configuring it so that <fixed> is a "compact" element, and all others are "block" elements should provide you with the formatting you desire.

    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
Re: XML::LibXML unbindNode leaves blank lines in the xml file
by wjw (Priest) on May 16, 2014 at 03:23 UTC
    Did a search which led to This which is pretty much the same thing anonymous just posted... with a bit more explanation.

    Hope that is helpfull...

    ...the majority is always wrong, and always the last to know about it...
    Insanity: Doing the same thing over and over again and expecting different results...
Re: XML::LibXML unbindNode leaves blank lines in the xml file
by Anonymous Monk on May 16, 2014 at 03:19 UTC

    xml perspective this doesn't change the structure, but it looks bad. Am I doing something wrong...

    You're thinking the wrong way :) the whitespace is data (not structure ) ... also if you don't look at it , it won't look bad :P

    Or is there an easy way to avoid it?

    XML::LibXML::PrettyPrint

Re: XML::LibXML unbindNode leaves blank lines in the xml file
by Anonymous Monk on May 21, 2014 at 04:57 UTC

    the answer may be here http://www.perlmonks.org/?node_id=830411 courtesy of ikegami

    my $parser = XML::LibXML->new(); $parser->keep_blanks(0); ... my $changed = $dom->toString(1);

    I take no credit for this answer other than having been looking for the same answer for a couple of days before finding it.

      Thank you for this useful suggestion. Works like a charm for me :-)