alan_olsen has asked for the wisdom of the Perl Monks concerning the following question:

I have a chunk of xml. I am using XML::LibXML to parse and mangle the data. If I unlink a node and then output the data to a file, the node is replaced by a blank line. Is there any way to change the behavior to not add the extranious whitespace?

Example:

Before xml removal: <item1/> <item2/> <item3/> After removing item2: <item1/> <item3/> Here is the sample code snippet (And yes, I pull the list using XML::S +imple.) my $manifesto = XMLin($google_manifest); foreach my $project (sort keys %{$manifesto->{'project'}}){ if ($project){ $project = 'a/aosp/' . $project; print "Project = $project \n"; my $query1 = '//project[@name = "' . $project . '"]'; my ($node) = $doc->findnodes($query1); if ($node){ $node->unbindNode(); } } } print $doc->toFile($new_manifest);

Ideas?

Replies are listed 'Best First'.
Re: Removing whitespace from deleted items in XML::LibXML
by ikegami (Patriarch) on Dec 08, 2011 at 22:56 UTC

    the node is replaced by a blank line.

    The node wasn't replaced with anything. You started with

    < i t e m 1 / > LF < i t e m 2 / > LF < i t e m 3 / > LF

    If you delete the second element, you get

    < i t e m 1 / > LF LF < i t e m 3 / > LF

    Nothing is added in its place.

    You want to delete the leading newline of node that follows if it's a text node.

    use strict; use warnings; use open ':std', ':locale'; use XML::LibXML qw( XML_TEXT_NODE ); sub remove_newline_that_follows { my ($node) = @_; my $next_node = $node->nextSibling() or return; $next_node->nodeType() == XML_TEXT_NODE or return; my $text = $next_node->data(); $text eq "" and return remove_newline_that_follows($next_node); $text =~ s/^\n// and $next_node->setData($text); } sub remove_node { my ($node) = @_; $node->parentNode()->removeChild($node); } my $doc = XML::LibXML->load_xml( string => <<'__EOI__' ); <root> <item1/> <item2/> <item3/> </root> __EOI__ my $root = $doc->documentElement(); my ($node) = $root->findnodes('//item2') or die; remove_newline_that_follows($node); remove_node($node); print $root->toString();

    Update: Tested and fixed the broken XPath (by replacing it). I was using the following (marked as untested):

    my ($next_node) = $node->findnodes( 'following-sibling::*[ position()=1 and text() ]') or return;

    The correct XPath is

    my ($next_node) = $node->findnodes( 'following-sibling::node()[ position()=1 and self::text() ]') or return;

    Of course, what I used instead is much clearer (and surely faster).

    my $next_node = $node->nextSibling() or return; $next_node->nodeType() == XML_TEXT_NODE or return;
Re: Removing whitespace from deleted items in XML::LibXML
by tobyink (Canon) on Dec 08, 2011 at 23:25 UTC

    As others have said, a blank line is not being inserted. Your data is:

    ELEMENT LINEFEED ELEMENT LINEFEED ELEMENT

    and you're removing the middle element, leaving just:

    ELEMENT LINEFEED LINEFEED ELEMENT

    The two linefeeds in a row give you an empty line.

    If whitespace is not considered meaningful in your XML format, you could simply choose to ignore the issue. But if getting the whitespace right is important to you, then you could take a look at my XML prettyprinting module, XML::LibXML::PrettyPrint.

Re: Removing whitespace from deleted items in XML::LibXML
by choroba (Cardinal) on Dec 08, 2011 at 22:57 UTC
    The node is not replaced by a blank line. The end of line was there even in the original data: it is the end of the line containing the element to be removed. If you want to remove it, too, just remove the corresponding text node.