in reply to XML::Simple and <tag&gt </tag>

From the documentation:
WHERE TO FROM HERE? XML::Simple is able to present a simple API because it makes so +me assumptions on your behalf. These include: o You're not interested in text content consisting only of wh +itespace o ...
So it seems you need some more sophisticated parser, like XML::Parser or XML::LibXML.

Arjen

Replies are listed 'Best First'.
Re: Re: XML::Simple and <tag&gt </tag>
by gmpassos (Priest) on May 18, 2004 at 20:12 UTC
    XML::Simple is not a parser! XML::Simple uses XML::SAX or XML::Parser if they are installed, and in this order of preference.

    The questions is that by default a content of a node that only have spaces is just ignored, or we will always have content if the XML tree is idented.

    Graciliano M. P.
    "Creativity is the expression of the liberty".

      XML::Simple is not a parser! XML::Simple uses XML::SAX or XML::Parser if they are installed, and in this order of preference.
      Indeed. My bad.
      The questions is that by default a content of a node that only have spaces is just ignored...
      Using the SuppressEmpty option, you can specify how an element containing only whitespace is ignored, and not if it is ignored.
      or we will always have content if the XML tree is idented.
      I don't understand this part of your reply.

      Referring to the question of the OP: "Is there any way to force XML::Simple to keep the spaces?". The answer is "No, there isn't". Elements consisting of only empty spaces are ignored, and the only thing you can control is whether they appear in the resulting hash as an empty string, undef, or not at all.

      Arjen

        s/idented/indented/ Consider XML like this:
        <document id=23> <title>Title</title> <subhead>Subhead</subhead> </document>
        In the above, if we preserve whitespace, we'd get something like the following:
        %xml = ( document => { id => 23, title => 'Title', subhead => 'Subhead', content => "\n \n \n", } )
        Because the 'document' element's 'untagged' content is the whitespace around the other tags.
        "Get real! This is a discussion group, not a helpdesk. You post something, we discuss its implications. If the discussion happens to answer a question you've asked, that's incidental." -- nobull@mail.com in clpm

        Then I guess I have to tweak XML::Simple :-)
        Here is a patch that adds another option for suppressempty that instructs XML::Simple to keep the whitespace IFF there are no subtags.

        --- XML/Simple.pm~ Tue Oct 22 20:05:02 2002 +++ XML/Simple.pm Wed May 19 16:22:56 2004 @@ -690,6 +690,7 @@ # Add any nested elements + my $keepspaces = (scalar(@_) == 2 and $self->{opt}->{suppressempty} + eq '0'); # only if there are no subtags my($key, $val); while(@_) { $key = shift; @@ -700,7 +701,7 @@ next if(!defined($val) and $self->{opt}->{suppressempty}); } elsif($key eq '0') { - next if($val =~ m{^\s*$}s); # Skip all whitespace content + next if (!$keepspaces and $val =~ m{^\s*$}s); # Skip all white +space content, if(!%$attr and !@_) { # Short circuit text in tag with n +o attr return($self->{opt}->{forcecontent} ? { $self->{opt}->{contentkey} => $val } : $val @@ -1426,7 +1427,7 @@ When used with C<XMLin()>, any attributes in the XML will be ignored. -=item suppressempty => 1 | '' | undef (B<in>) +=item suppressempty => 1 | '' | undef | 0 (B<in>) This option controls what C<XMLin()> should do with empty elements (n +o attributes and no content). The default behaviour is to represent th +em as @@ -1433,8 +1434,10 @@ empty hashes. Setting this option to a true value (eg: 1) will cause + empty elements to be skipped altogether. Setting the option to 'undef' or +the empty string will cause empty elements to be represented as the undefined v +alue or -the empty string respectively. The latter two alternatives are a lit +tle -easier to test for in your code than a hash with no keys. +the empty string respectively. The latter two alternatives are a lit +tle easier +to test for in your code than a hash with no keys. Setting the optio +n to 0 +will cause elements containing only whitespace to stay intact in the +resulting +structure. =item cache => [ cache scheme(s) ] (B<in>)

        That is if you set suppressempty=>0 then

        <tag> </tag>
        will produce
        $data = { tag => ' ', }
        , while
        <tag> <subtag>hello</subtag> <other>world</other> </tag> <code> will produce <code> $data = { tag => { subtag => 'hello', other => 'world', } }

        I'll submit the patch to Grant McLean in a few days if there are no problems with it.

        Jenda
        Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
           -- Rick Osborne

        Edit by castaway: Closed small tag in signature