Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse an xml file (sample input and out put below).I have some sample code written below,not sure how to proceed,I have the sample input xml file and the output.Please advise

Input XML file:- <build> <name>lpass</name> <build_id>M8960AAAAANAAL1004</build_id> <windows_root_path cmm_root_path_var="PASS_BUILD_ROOT">\\TEST\M9 +650AAAAANAAL1004\</windows_root_path> <linux_root_path cmm_root_path_var="PASS_BUILD_ROOT"/> <image_dir>lpass_proc</image_dir> <fat_file> <build> <name>rpm</name> <build_id>M8960AAAAANAAR1004</build_id> <windows_root_path cmm_root_path_var="PM_BUILD_ROOT">\\TEST\M965 +0AAAAANAAR1004\</windows_root_path> <linux_root_path cmm_root_path_var="PM_BUILD_ROOT"/> <image_dir>rpm_proc</image_dir> <download_file> <file_name>rpm.mbn</file_name> <file_path>rpm_proc/</file_path> </download_file> </build> <build> <name>wcnss</name> <build_id>M8960AAAAANAAW1004</build_id> <windows_root_path cmm_root_path_var="NSS_BUILD_ROOT">\\TEST\M96 +50AAAAANAAW1004\</windows_root_path> <linux_root_path cmm_root_path_var="NSS_BUILD_ROOT"/> <image_dir>wcnss_proc</image_dir> OUTPUT should be:- PASS_BUILD_ROOT \\TEST\M9650AAAAANAAL1004 PM_BUILD_ROOT \\TEST\M9650AAAAANAAR1004 NSS_BUILD_ROOT \\TEST\M9650AAAAANAAW1004 CURRENT CODE:- use strict; use warnings; use Getopt::Long; use File::Glob (); my %options=(); GetOptions (\%options,'loc=s'); #print "\n$options{loc}"; my $contents_xml = glob(($options{loc} . '\\contents.xml')); #print "\n$contents_xml";

Replies are listed 'Best First'.
Re: Parsing an xml file
by davido (Cardinal) on Jul 08, 2011 at 04:46 UTC

    Assuming you've got well-formed XML you can use XML::Twig or XML::Simple. If they don't make manipulation simple enough, you can look at CPAN for an XML module that more specifically fits your needs.

    But the XML you posted fails to parse with either of the modules I mentioned, as well as on the W3 School's XML Validator, and that makes it sort of tricky to provide a good working example for you.

    If I were to use XML::Simple, I would probably start by using the XMLin() function to get a reference to a datastructure. I might use Data::Dumper to see what the structure looks like, and use that dump as the basis for determining where in the structure the data I need is hiding. XML::Simple does provide some basic options that alter the shape of the datastructure, so read the docs and maybe you can get the structure to be fairly easy to manipulate further.


    Dave

Re: Parsing an xml file
by Khen1950fx (Canon) on Jul 08, 2011 at 07:21 UTC
    What you were trying wasn't clear to me , so I did it as I see it. First, your xml isn't valid---the schema messed-up <build>. Second, I usually avoid glob when I do XML---better to use findnodes or something similar. Here's the script that I ran:
    #!/usr/bin/perl use strict; use warnings; use XML::LibXML; my @files = '/root/Desktop/your.xml'; my $parser = XML::LibXML->new(); foreach my $file( @files) { my $doc = $parser->parse_file($file); my $cmm_root_path_var = $doc->findnodes("//Test/*"); print $doc->to_literal(), "\n"; }
    And I ran your xml through xmltidy. It's still not valid, but works for this test:
    <?xml version="1.0" encoding="utf-8"?> <build> <name>lpass</name> <build_id>M8960AAAAANAAL1004</build_id> <windows_root_path cmm_root_path_var="PASS_BUILD_ROOT">\\TEST\M965 +0AAAAANAAL1004\</windows_root_path> <linux_root_path cmm_root_path_var="PASS_BUILD_ROOT" /> <image_dir>lpass_proc</image_dir> <name>rpm</name> <build_id>M8960AAAAANAAR1004</build_id> <windows_root_path cmm_root_path_var="PM_BUILD_ROOT">\\TEST\M9650AAA +AANAAR1004\</windows_root_path> <linux_root_path cmm_root_path_var="PM_BUILD_ROOT" /> <image_dir>rpm_proc</image_dir> <download_file> <file_name>rpm.mbn</file_name> <file_path>rpm_proc/</file_path> </download_file> <name>wcnss</name> <build_id>M8960AAAAANAAW1004</build_id> <windows_root_path cmm_root_path_var="NSS_BUILD_ROOT">\\TEST\M9650AA +AAANAAW1004\</windows_root_path> <linux_root_path cmm_root_path_var="NSS_BUILD_ROOT" /> <image_dir>wcnss_proc</image_dir> </build>
Re: Parsing an xml file
by i5513 (Pilgrim) on Jul 08, 2011 at 05:18 UTC

    Read about XPath and search it in CPAN, for example you can use XML-XPath

    Maybe you would like to learn xmlstarlet which is a command line xml editing tool

Re: Parsing an xml file
by NetWallah (Canon) on Jul 08, 2011 at 05:46 UTC
    The code below does the job with a VERY CRUDE parser - this may be enough to get you started, but I certainly would not recommend it if the "XML" you expect to parse varies in syntax, even slightly.
    use strict; use warnings; use Getopt::Long; use File::Glob (); my %options=(); GetOptions (\%options,'loc=s'); my $contents_xml = glob(($options{loc} . '\\contents.xml')); open my $f, "<", $contents_xml or die "Cannot open $contents_xml:$!"; while (<$f>){ m/=\s*"([^"]+)"[^\/]*?>([^<]+)/ and print "$1 $2\n"; } close $f;
    Again - this code makes many assumptions on the syntax AND formatting of the incoming "XML".

                "XML is like violence: if it doesn't solve your problem, use more."