What should this print in your opinion?

use XML::Simple; my $parser = new XML::Simple( forcearray => [qw(AREA LOCALE)], suppressempty => '', ); $data = $parser->XMLin(<<'*END*'); <root> <empty></empty> <whitespace> </whitespace> <full>ahoj</full> <with_spaces> cau </with_spaces> </root> *END* use Data::Dumper; print Dumper($data);

Considering that NormaliseSpace is by default 0 which means "whitespace is passed through unaltered (except of course for the normalisation of whitespace in attribute values which is mandated by the XML recommendation)" I think it should print

$VAR1 = {
          'whitespace' => ' ',
          'with_spaces' => '  cau  ',
          'empty' => '',
          'full' => 'ahoj'
        };
XML::Simple disagrees and its result is
$VAR1 = {
          'whitespace' => '',
          'with_spaces' => '  cau  ',
          'empty' => '',
          'full' => 'ahoj'
        };
As you can see the content of the whitespace tag was removed.

Now I can understand why you would not want to end up with

$VAR1 = {
          'whitespace' => ' ',
          'with_spaces' => '  cau  ',
          'content' => [
                         '
        ',
                         '
        ',
                         '
        ',
                         '
        ',
                         '
'
                       ],
          'empty' => '',
          'full' => 'ahoj'
        };
but returning the same for <tag></tag> and <tag> </tag> doesn't seem entirely correct. To me at least.

Jenda
XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

Replies are listed 'Best First'.
Re: XML::Simple bug? aka I want the whitespace dude!
by derby (Abbot) on Aug 22, 2005 at 20:25 UTC

    You have to read a bit further in the doc:

    XML::Simple is able to present a simple API because it makes some assumptions on your behalf. These include:

    • You're not interested in text content consisting only of whitespace

    XML::Simple skips *white space* only text content.

    -derby
Re: XML::Simple bug? aka I want the whitespace dude!
by grantm (Parson) on Aug 23, 2005 at 00:35 UTC

    What you're seeing is definitely by design.

    I originally wrote XML::Simple specifically for reading (and later writing) config files in XML format. It proved to be useful for other simple XML tasks too. However it was never intended to be 'the one and only Perl module you'll ever need for working with XML'.

    Personally, for most tasks that involve reading XML, I tend to use XML::LibXML and often it requires less code than XML::Simple would have - even for simple things (yay XPath!). For writing XML, I tend to use the Template Toolkit or HTML::Mason. More advice in the Perl XML FAQ

      OK and would it hurt anything to preserve the whitespace in case of tags with no children? Of course you would not want to keep the whitespace for <foo>

      <foo> <bar>x</bar> <baz>y</baz> </foo>
      that would make a fairly big difference in the results but for <foo> </foo>? The only difference is that you get  ..., foo => ' ', ... instead of  ..., foo => '', ... which would actually make it consistent with the handling of <foo> whitespace preserved  </foo>. What I have IS basically a config file, but I need to preserve the whitespace, even if it's the only content of an option.

      The whole change necessary in the module would be

      line 925 << next if($val =~ m{^\s*$}s); # Skip all whitespace content >> next if (($self->{opt}->{suppressempty} or %$attr) and $val =~ + m{^\s*$}s); # Skip all whitespace content line 956 >> if (!$self->{opt}->{suppressempty} and scalar(keys %$attr) > 1 a +nd $attr->{$self->{opt}->{contentkey}} =~ m{^\s*$}s) { >> delete $attr->{$self->{opt}->{contentkey}}; >> }

      Jenda
      XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

        would it hurt anything to preserve the whitespace in case of tags with no children?

        This is going to sound rude and uncaring (which is unfortunate because I try not to be either) but ... Yes, it would hurt.

        You're talking about changing the default behaviour. In a module that's been around as long as this one has then that is certain to break a lot of scripts. Just by way of example, it broke 21 tests in the test suite that ships with the module. It also introduced almost 2000 warning messages during make test.

        Now obviously it would be possible to clean up the warnings and add another option so the default behaviour was not affected, but that's not really going to fly either. XML::Simple already has far too many options. The claim to the name 'Simple' was lost years ago. I regularly reject requests to add 'one simple option' because I don't want to make matters worse.

        The reality is that XML::LibXML is a powerful and flexible module that can do what you want. You might want to put your own thin wrapper around it to simplify the things that you want to do regularly. In the end though, it will be a better solution because it will work the way you expect it to work and it won't bogged down with options to make it work in the weird and wonderful ways other people expect.

        Sorry if I sound grumpy.

Re: XML::Simple bug? aka I want the whitespace dude!
by idsfa (Vicar) on Aug 22, 2005 at 20:26 UTC

    My reading of the source confirms that completely empty node content is discarded. This comment precedes the collapse method:

    # This routine cuts down the noise by discarding any text # content consisting of only whitespace ...

    There does not appear to be a way to turn this off.


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. -- Cyrus H. Gordon
Re: XML::Simple bug? aka I want the whitespace dude!
by revdiablo (Prior) on Aug 22, 2005 at 20:39 UTC

    Sidestepping the actual question you pose, these are the kind of things that make me avoid XML::Simple for much of anything. Using it to access an XML config file is probably ok -- especially since it's what the wonderful Config::Auto uses -- but beyond that, I'd reach for something else. My favorite something else is currently XML::TreeBuilder. It's a subclass of HTML::TreeBuilder, and shares the same power hidden behind a pretty darned nice interface.

      There is a reason why I like XML::Simple. I don't end up with a heap of crazy objects with loads of impossible to remember methods and properties, but instead I get just a pure datastructure I can map and grep and loop through using the Perl builtin operators and functions. The AFAPITL (Another Fucking API To Learn) problem.

      I don't spend my days massaging XML with Perl, Perl is only about 20% of my work and only about 20% of my Perl work involves XML. Days, sometime weeks, pass by without me having to do anything at all with XML. Thanks god.

      Jenda
      XML sucks. Badly. SOAP on the other hand is the most powerfull vacuum pump ever invented.

Re: XML::Simple bug? aka I want the whitespace dude!
by pg (Canon) on Aug 22, 2005 at 21:03 UTC

    It is not a bug, as this is the claimed behaviour of XML::Simple as other monks pointed out.

    It is a bug if we are talking about XML specification. XML specification clearly defined what an “empty element” is. An empty element has no content, even no white spaces. This indicates that white spaces in the content of a white space only element still should be preserved. (Of course you can choose to strip it, but that is a different story.)