raju_400 has asked for the wisdom of the Perl Monks concerning the following question:

I have to parse a xml string which is basically output of a tool cmd. Now, when I am firing the cmd as follows:
use XML::Simple; .. my @outputInfo = qx[$cmd]; my $ref = XMLin(@outputInfo);

The execution shows: Options must be name=>value pairs (odd number supplied)

The start of the output contains a string: "Starting webservices call..." etc followed by <> xml tags on the next line. Is that the reason XMLin could not treat this as a valid xml doc. Or should we only use scalar vars in XMLin (I have used an array)?

Replies are listed 'Best First'.
Re: XML parsing
by moritz (Cardinal) on Sep 17, 2008 at 07:45 UTC
    If you use an array, everything but the first item will be interpreted as options. So don't do that, use a string instead.

    And you have to remove everything that's not XML first.

Re: XML parsing
by pjotrik (Friar) on Sep 17, 2008 at 08:02 UTC
    As you can read in XML::Simple, XMLin expects its first argument to be the xml (as a string, filename, or filehandle), the remaining arguments should be option name => value pairs.

    Read the output of your command into a single string instead of an array and feed this string to XMLin.

    Update: Missed the last sentence... The XML must of course be an XML - remove any leading non-xml header.

Re: XML parsing
by mscharrer (Hermit) on Sep 17, 2008 at 09:34 UTC
    If the output is XML only try the stringification operator "" or join:
    my $ref = XMLin("@outputInfo");

      This would add spaces into the data. If you do not want the data split into lines, do not split it. It's much better to assign all the data into a single scalar right away than to split it and then merge it back.

      my $data = qx[$cmd];
        Yes, you are of course right. I was only looking on the xmlin line not on the qx one. However, the added spaces shouldn't do any harm in normal XML files.

        I had the "@array" version in mind because I used XML::Simple just this week and had to read the input file manually line-wise to fix a known broken tag before calling xmlin.

Re: XML parsing
by wol (Hermit) on Sep 18, 2008 at 15:49 UTC
    So, to pull together the responses so far:
    use XML::Simple; .. my @outputInfo = qx[$cmd]; # Discard that pesky first line, without needing to know # what the line ending chars are shift @outputInfo; # Join array to make a single string, with line endings # being turned into line endings (but not necessarily # exactly the same line endings as the command may have # used, depending on your OS, and/or the internals of the # command) my $ref = XMLin(join("\n", @outputInfo));
      Thanks to all of you. I really like the solution of putting the cmd output to a scalar variable and also to an array after converting it by join.

      I carefully analysed the output of the cmd that contains as follows:

      first line: 'Starting webservices call'

      second line: blank

      third line: <?xml version="1.0"?>

      followed by the entire xml doc.

      I wonder how much would it be reliable to discard the first two lines, if this changes in future. Can I use .. and ... to extract between "<?xml version" and the last tag. Need some expert advice considering the reliability.

      Many thanks

      Sudip

        use XML::Simple; .. my $outputInfo = qx[$cmd]; $outputInfo =~ s/^.*?(?=<\?xml)//s; my $ref = XMLin($outputInfo));