naChoZ has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying to write some code to read and write some xml and having no luck. I'm attempting to use XML::XPath, but I'm open to others if anyone has any suggestions. (I tried XML::Simple, but it completely butchers the document in the output.)

Here's some sample xml:

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE ADI SYSTEM "ADI.DTD"> <ADI> <Metadata> <AMS Asset_Class="package" Asset_Name="xxxxxxxxxxxxxxxxxxxxxxxx_pa +ckage" /> <App_Data App="MOD" Name="Metadata_Spec_Version" Value="xxxxxxxxxx +xxxxx"/> <App_Data App="MOD" Name="Metadata_Tier" Value="xxxxxxxxxxxxxxxxxx +xxxxx"/> </Metadata> <Asset> <Metadata> <AMS Asset_Class="title" Asset_Name="xxxxxxxxxxxxxxxxxxxxxxxx_ti +tle" /> <App_Data App="MOD" Name="Type" Value="title" /> <App_Data App="MOD" Name="Year" Value="1999" /> </Metadata> <Asset> <Metadata> <AMS Asset_Class="poster" Asset_Name="xxxxxxxxxxxxxxxxxxxxxxxx +_poster" /> <App_Data App="MOD" Name="Content_CheckSum" Value="3167abc898d +ba878794754dc4afca0fd"/> <App_Data App="MOD" Name="Content_FileSize" Value="230454"/> <App_Data App="MOD" Name="Image_Aspect_Ratio" Value="320x240"/ +> <App_Data App="MOD" Name="Type" Value="poster"/> </Metadata> <Content Value="xxxxxxxxxxxxxxxx.bmp" /> </Asset> <Asset> <Metadata> <AMS Asset_Class="movie" Asset_Name="xxxxxxxxxxxxxxxxxxxxxxxx_ +movie" /> <App_Data App="MOD" Name="Setting1" Value="Y"/> <App_Data App="MOD" Name="Languages" Value="en"/> </Metadata> <Content Value="xxxxxxxxxxxxxxxxxxxxxxxxxx.mpg"/> </Asset> </Asset> </ADI>

So, for some example code:

#!/usr/bin/perl use strict; use warnings; use XML::XPath; use Data::Dumper; my $xpath = XML::XPath->new(filename => 'TEST.XML'); my $nodeset = $xpath->find('/ADI/Asset/Asset/Metadata[AMS[@Asset_Class +="poster"]]'); ddump('ref nodeset', __LINE__, ref $nodeset); # this shows 'XML::XPat +h::NodeSet'

After this, I've tried iterating with $nodeset->get_nodelist and then iterating those nodes with the ->getChildNodes and using ->string_value and ->getName methods, but I can never seem to get at what I want. I've tried a variety of Xpath strings, notably .//AMS[@Asset_Class="poster" which with an xpath tester does get me to where I expect, I just can't seem to turn that into something I can code against.

Any help appreciated. Tearing my hair out for way too long on this. Even tried AI to at least look for pointers. No, it didn't work at all. #facepalm

(ddump() is just my little wrapper around Data::Dumper, it's only slightly more advanced than print Dumper().)

--
Andy

Replies are listed 'Best First'.
Re: XML Parsing and Xpath confusion
by ikegami (Patriarch) on Jul 03, 2024 at 14:09 UTC

    Think of a [] in an XPath as similar to a WHERE clause in an SQL statement, which is to say a filter.

    /ADI/Asset/Asset/Metadata suggests that you want some asset metadata nodes. I'm going to assume that's correct for now.

    But we only want metadata nodes where AMS/@Asset_Class equals poster. This is a filter.

    /ADI/Asset/Asset/Metadata[ AMS/@Asset_Class = "poster" ]

    Since this is a nested collection of assets, maybe you want to look any depth.

    //Asset/Metadata[ AMS/@Asset_Class = "poster" ]

    With the above, the asset name would be found at relative path AMS/@Asset_name. App_Data would return rows of metadata.


    Is the asset metadata really what you want, though? If you wanted the whole asset instead of just its metadata (e.g. if you wanted to access the asset's content), you should use one of the following instead:

    /ADI/Asset/Asset[ Metadata/AMS/@Asset_Class = "poster" ]
    //Asset[ Metadata/AMS/@Asset_Class = "poster" ]

    With these, the asset name would be found at relative path Metadata/AMS/@Asset_name. Metadata/App_Data would return rows of metadata. Content/@Value would return the poster image.


    As a side note, an example of multiple [] in an XPath,

    //Asset[ AMS/@Asset_Class = "package" and AMS/@Asset_Name = "some_pack +age" ]/Asset[ Metadata/AMS/@Asset_Class = "poster" ]

    This would return the poster asset for a specific package.

      Thanks ikegami. I was able to use those tips with XML::LibXML to get what I wanted.

      Now I need to figure out why XML::LibXML changed a completely different part of the xml document. There's a description string elsewhere in the document and it converted an html entity (&apos;) to a single quote. I already set expand_entities to 0 so I'm not wtf is up with that... ugh... I mean it's technically correct and probably won't break anything, but I'd rather it not touch anything I didn't ask it to touch.

      --
      Andy

Re: XML Parsing and Xpath confusion
by sectokia (Friar) on Jul 04, 2024 at 02:00 UTC

    I really like XML::Twig. Not sure what you are trying to do exactly but a basic example would be:

    use strict; use warnings; use XML::Twig; my $t= XML::Twig->new( twig_handlers => { AMS => sub { print $_->{att}{Asset_Class}." has name ".$_->{att}{Asset_ +Name}."\n"; } })->parsefile('test.xml');

    Output:

    package has name xxxxxxxxxxxxxxxxxxxxxxxx_package title has name xxxxxxxxxxxxxxxxxxxxxxxx_title poster has name xxxxxxxxxxxxxxxxxxxxxxxx_poster movie has name xxxxxxxxxxxxxxxxxxxxxxxx_movie

      The original tried to filter for just the posters, and it attempted to get the Metadata element. You get a different element, and you don't limit to just getting the element relating to the poster.

      (XML::Twig is also insanely slower.)