in reply to Re: XML::Compile template initialized from XML document
in thread XML::Compile template initialized from XML document

I accept the possibility. I'm new to both library families (XML::Compile and XML::LibXML) and they are large and complicated. Let me show you the code I use to compile the schema and to parse the XML that produces a structure I find difficult to use, and maybe you can help me get to this easier model

I'm trying to post the minimal helpful code. The getSASchema subroutine successfully downloads the XSD and compiles it. The saQuery subroutine successfully pulls the XML I want and runs the reader I get from the schema object's "compile" method. As you see the subroutine returns the reference that comes back from calling the reader with the XML text.

... ... sub getSASchema { my ($config, $lwp) = @_; my $saSchemaUrl = "https://" . $config->{saserver} . ":" . $config +->{saport} . "/serverautomation/SA-REST.xsd"; my $sareq = HTTP::Request->new( GET => $saSchemaUrl ); $sareq->authorization_basic($config->{besuser}, $config->{bespassw +ord}); my $xsd = $lwp->request($sareq); my $schema = XML::Compile::Schema->new($xsd->{_content}); return $schema; } ## Handle querying the Server Automation API. sub saQuery { my ($config, $lwp, $schema) = @_; my $saPlanUrl = "https://" . $config->{saserver} . ":" . $config->{saport} . "/serverautomation" . $config->{saplanurl}; my $sareq = HTTP::Request->new( GET => $saPlanUrl ); $sareq->authorization_basic($config->{besuser}, $config->{bespassw +ord}); my $xml = $lwp->request($sareq); my $planreader = $schema->compile( READER => "{http://iemfsa.tivol +i.ibm.com/REST}sa-rest"); my $xmltxt = $xml->{_content}; my $tree = $planreader->($xmltxt); return $tree; } ... ... # Down in my main. $config is a hash containing config # data parsed from a file, and $lwp is an instance of # LWP::UserAgent my $saSchema = getSASchema($config, $lwp); ## Fetch the "raw" automation plan from the BigFix SA server my $doc = saQuery($config, $lwp, $saSchema); my $basenode = $doc->{_}; print Dumper($doc); print Dumper($basenode);

As you see in my main code, I then use Dumper to see what I get from that process. Here's what I get:

$VAR1 = { '_MIXED_ELEMENT_MODE' => 'ATTRIBUTES', '_' => bless( do{\(my $o = 85686224)}, 'XML::LibXML::Element +' ) }; $VAR1 = bless( do{\(my $o = 85686224)}, 'XML::LibXML::Element' );

That is not a hash of hashes. It is XML::LibXML objects. This is how the XML::Compile documentation tells me to make a reader, but I'm not sure this reader is the same reader you refer to as XML::Compile::Translate::Reader. Maybe you can show me a fragment to replace my reader in saQuery with one that you think will produce what I want? And please remember that after modifying my structure I need to be able to use my XML::Compile::Schema object to construct compliant output XML...

I really appreciate your help. I'm learning a lot.

Replies are listed 'Best First'.
Re^3: XML::Compile template initialized from XML document
by markov (Scribe) on May 12, 2016 at 07:08 UTC

    Be sure that you create the ::Schema (better the ::Cache) only once in your program, and reuse it. The same for your compiled handlers: compilation is expensive, (re)use is cheap.

    Probably you want to use $xmltxt = $msg->decoded_content

    Apparently, your schema uses mixed="true". When that is not a mistake (which it often is), than you have to do things partially manually. XHTML is an example of mixed XML: it does not translate into a predictable DOM tree. Tune the handling of mixed elements to suit your needs via the mixed_elements parameter.

      I do create the Schema only once. As I have only the one Schema and I use it repeatedly while looping over the contents of a file and then it is gone I saw no advantage to using Cache. This is a "utility" script that consumes a web service, not a web service itself.

      The schema comes from a commercial application. It looks to me like it is an oversight that "mixed" is "true" on their complex types. Again, I'm only passing familiar with the full XML Schema definition. It looks to me like "mixed" should only be true when an element can contain both content (text/data) AND other elements (tags) mixed. Is that right? The actual XML documents I have seen from this interface do not have that. Right now I actually pull the schema from the web service. Would I be better off grabbing a copy and setting "mixed" to "false" for my work?

      Following up. Are you suggesting that the reader is producing the tree of objects instead of a hash of hashes because the schema is mixed? In other words, the library knows it can't render a mixed schema as a hash of hashes and that is why it doesn't?

      I'm not going to wait for your answer. Because I know the schema doesn't have to have mixed elements, I'm going to make a local file copy of the schema, change it so mixed is "false" and see what I get. But I would welcome anyone's input on this while I do that work. Thanks!

        Now replying to myself! I made a local copy of the schema and changed all the mixed="true" to "false." This revealed one other problem: an element that needed a minOccurs="0". I added that and now my XML parses into a lovely and very perl-ish hash of hashes and arrays!

        The good news is that I am able to contact the developers and suggest the schema be changed along these lines. In the meantime I can proceed with my local version of the schema.

        Thank you to all the monks who helped! markov in particular gave me the clue that got me to a solution.

        In some cases, mixed elements are necessary: for instance when a XHTML document is included inside a structured XML message. But often, mixed elements are abused when the schema author wanted "free format XML" because it was too hard to specify or laziness.

        So, the mixed_elements option can be used in what way XML::Compile will help you deal with these tricks. More like XML::Simple style parsing of the mixed element. That usually works out.

        The effect of your modification is probably the same as specifing mixed_elements => 'STRUCTURAL' when compiling a reader.

        minOccur errors are common. Sometimes, I see confusion between nillable and optional... there is a flag for that as well: interpret_nillable_as_optional