http://qs1969.pair.com?node_id=441290

Grundle has asked for the wisdom of the Perl Monks concerning the following question:

I am currently parsing a relatively simple XML file with the following form with XML::Simple
<meta fpi="1234567890"> <isbn>1-234-56789-0</isbn> <edition>First</edition> <authorgroup> <author> <firstname>John</firstname> <surname>Smith</surname> <authorblurb url="http://www.someurl.com/etc/nothing.php"/> </author> </authorgroup> <pagenums>384</pagenums> <pubdate>October 2001</pubdate> <subjectset> <subject>some.lame.subject</subject> <subject>another.lame.subject</subject> </subjectset> <publisher> <publishername>Publisher Inc.</publishername> <imprintname>Publisher Inc.</imprintname> </publisher> ... </meta>
The problem I encounter is when I am done "parsing" and separating this data, I do an XMLout($data) but my new data looks like the following.
<imeta edition="First" fpi="0123456790" isbn="0-123-456789-0" msrp="39 +.95" pagenums="384" pubdate="October 2001"> <authorgroup name="author" firstname="John" surname="Smith"> <authorblurb url="http://www.someurl.com/etc/nothing.php" /> </authorgroup> ..
Notice how previous tags are now attribute identifiers. Is there a way for XML::Simple to preserve the exact XML structure of my original file when I call XMLout? I don't think it is too much to ask to have the same tags on output. Am I missing an option in XMLout?

Replies are listed 'Best First'.
Re: XML::Simple "transforming data"
by merlyn (Sage) on Mar 21, 2005 at 20:42 UTC
    The point of XML::Simple is to have a way to specify a Perl data structure with XML, and to serialize that data structure into some valid XML representation. It's not meant for very specific structure control in either direction, and if you find yourself trying to tweak things to get it to be precisely as you wish, you've probably gone beyond its abilities.

    If you want to manage precise XML transformations, look in to XML::LibXML or the higher-level xsh scripting language. I did a column on that a while back.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Ironically I am using XML::LibXML later on in the same program. I opted to use XML::Simple because XML::LibXML was choking on a very large file. My solution was to break it up into smaller sections so that the majority of the data could be parsed. XML::Simple has no problem reading in all the data I fed it, but the outputing structure is not what I desired.

      My ultimate conclusion was to skip a data structure altogether for breaking up the file and just do a straight regex across the whole file. This preserves my structure exactly and allows me to specify which tags to separate across. Since I am working with a set standard (docbook) there should be no surprising or malformed tags (unless I am fed a corrupted file).

      What you said about being beyond XML::Simple's capabilities, led me in this direction. Thanks for letting me know that it was time to go down another path.

      sub splitNode2{ my ($name, $root, $tag, $filename) = @_; #print "opening file [$filename]\n"; if(!open(READ, "$filename")){ die "Cannot open file [$filename] for reading\n"; } my $content = ""; while(my $line = <READ>){ $content .= $line; } #print "matching <$tag>\n"; my @matches = $content =~ /(<$tag.*?>.*?<\/$tag>)/sg; if(scalar(@matches) < 1){ writeFile($name, "", $content); #writes the string to a file return 0; } my $count = 1; foreach (@matches){ writeFile("$name"."_"."$count", "", $_); $count++; #print "match: $_\n\n\n"; } }
Re: XML::Simple "transforming data"
by Thelonious (Scribe) on Mar 21, 2005 at 21:50 UTC
    XML::Simple's interface is pretty odd in that it outputs something different than it takes in by default. (Maybe it could be even a little more simple - and more Perlish...?) But there's an option:

    
    ForceArray => 1 *# in - important*
        This option should be set to '1' to force nested elements to be
        represented as arrays even when there is only one.
    
    ...that you can use like so:

    use XML::Simple; my $xml = XMLin(join('',<DATA>),ForceArray => 1); print XMLout($xml); __END__ <meta fpi="1234567890"> <isbn>1-234-56789-0</isbn> <edition>First</edition> <authorgroup> <author> <firstname>John</firstname> <surname>Smith</surname> <authorblurb url="http://www.someurl.com/etc/nothing.php"/> </author> </authorgroup> <pagenums>384</pagenums> <pubdate>October 2001</pubdate> <subjectset> <subject>some.lame.subject</subject> <subject>another.lame.subject</subject> </subjectset> <publisher> <publishername>Publisher Inc.</publishername> <imprintname>Publisher Inc.</imprintname> </publisher> </meta>

    ...I think that it outputs something very similar to what you're looking for...

    hth

      I think it'll do exactly what he wants if you add "RootName=>'meta'" to the call to XMLOut()


      ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
      =~y~b-v~a-z~s; print
You need to force a reference, otherwise XML::Simple folds simple elements into attributes
by inq123 (Sexton) on Mar 21, 2005 at 23:40 UTC
    The solution is actually just posted above - use ForceArray => 1 when you do XMLin(); The mechanism is as my title suggested: as long as you force a reference out of the value, that value will be treated as element instead of attribute when XMLout writes it out.

    Here's the set of options that I found the best (for my usage, and probably for general usage) for XML::Simple:

    my $obj = XMLin($filename, ForceArray => 1, ForceContent => 1, KeepRoo +t => 1, ContentKey => '_c', KeyAttr => []); open(OUT, ">$filename-1"); print OUT XMLout($obj, KeepRoot => 1, ContentKey => '_c', KeyAttr => [ +]);
    Hope it helps.
Re: XML::Simple "transforming data"
by duct_tape (Hermit) on Mar 21, 2005 at 20:31 UTC

    Look at the NoAttr option for XMLout. I believe that will do what you'd like. Note that it still may not preserve the exact same structure as your original data.

      I have considered this option, but the problem is that it will strip non-valid attributes, which is good, but the valid attributes will be taken out as well, and I would like to keep them in.

      Maybe I wasn't clear in my original description, but what I mean by "preserving the exact structure" includes valid attributes inside tags. This problem would be trivial if I could just flip NoAttr and move on.

      Thanks for the suggestion.
Re: XML::Simple "transforming data"
by Cody Pendant (Prior) on Mar 22, 2005 at 05:21 UTC
    I just wanted to say, this is such a frequently asked question. I know because I've asked it myself. It's not in Perlmonks Q&A section, in fact there's no XML section there, but it is in the XML::Simple FAQ under "XML::Simple turns nested elements into attributes".


    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss')
    =~y~b-v~a-z~s; print