Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to get a few values from an XML input assigned to variables. Specifically, I'm getting a huge amount of information from an XML feed and I only need very little of the information from it. Here's the XML that I have assigned to variable $xml. How do I pull three of the pieces of data, namely the LastTradePrice, NetChange, and LastTradeDate from this. All I need is the values between these tags assigned to variables. Thanks in advance, XML is pasted below.
<?xml version="1.0"?> <!DOCTYPE quotelist SYSTEM> <!-- Writing XML object quotelist --> <?version 1.0?> <?date Mon Nov 18 13:21:30 CST 2002?> <quotelist> <quotes type='java.util.ArrayList'> <size type='int'>6</size> <element type='com.stock.ptl.info.content.quotes.IndexQuote'> <Description type='java.lang.String'>NAS/NMS COMPSITE</Description +> <Symbol type='java.lang.String'>^IXIC.NaE</Symbol> <NetChange type='float'>9.06</NetChange> <Currency type='java.lang.String'>US dollar</Currency> <SecurityType type='java.lang.String'>INDX</SecurityType> <PercentChange type='float'>0.64</PercentChange> <LastTradePrice type='float'>1420.2</LastTradePrice> <LastTradeDate type='java.util.Date'>2002-11-18 07:01:00 CST</Last +TradeDate> <LastTradeTime type='java.lang.String'>19:01</LastTradeTime> <PreviousClose type='float'>1411.14</PreviousClose> <MarketSystem type='java.lang.String'>NASDAQ</MarketSystem> <Volume type='int'>0</Volume> <BackgroundSymbol type='java.lang.String'></BackgroundSymbol> <Week52High type='float'>0.0</Week52High> <Week52Low type='float'>0.0</Week52Low> <DayHigh type='float'>0.0</DayHigh> <DayLow type='float'>0.0</DayLow> <PreviousCloseDate type='null'></PreviousCloseDate> <OpenPrice type='float'>0.0</OpenPrice> <PrevYearHigh type='float'>0.0</PrevYearHigh> <PrevYearLow type='float'>0.0</PrevYearLow> <LifeHigh type='float'>0.0</LifeHigh> <LifeLow type='float'>0.0</LifeLow> <PriorYearClose type='float'>0.0</PriorYearClose> <LifeLowDate type='null'></LifeLowDate> <LifeHighDate type='null'></LifeHighDate> </element> <element type='com.stock.ptl.info.content.quotes.IndexQuote'> <Description type='java.lang.String'>DJ INDUSTRIAL</Description> <Symbol type='java.lang.String'>^DJI3.NaE</Symbol> <NetChange type='float'>0.0</NetChange> <Currency type='java.lang.String'>US dollar</Currency> <SecurityType type='java.lang.String'>INDX</SecurityType> <PercentChange type='float'>0.0</PercentChange> <LastTradePrice type='float'>7850.29</LastTradePrice> <LastTradeDate type='java.util.Date'>2002-10-11 10:07:00 CDT</Last +TradeDate> <LastTradeTime type='java.lang.String'>22:07</LastTradeTime> <PreviousClose type='float'>7850.29</PreviousClose> <MarketSystem type='java.lang.String'>Reuters</MarketSystem> <Volume type='int'>398600</Volume> <BackgroundSymbol type='java.lang.String'></BackgroundSymbol> <Week52High type='float'>0.0</Week52High> <Week52Low type='float'>0.0</Week52Low> <DayHigh type='float'>0.0</DayHigh> <DayLow type='float'>0.0</DayLow> <PreviousCloseDate type='null'></PreviousCloseDate> <OpenPrice type='float'>0.0</OpenPrice> <PrevYearHigh type='float'>0.0</PrevYearHigh> <PrevYearLow type='float'>0.0</PrevYearLow> <LifeHigh type='float'>0.0</LifeHigh> <LifeLow type='float'>0.0</LifeLow> <PriorYearClose type='float'>0.0</PriorYearClose> <LifeLowDate type='null'></LifeLowDate> <LifeHighDate type='null'></LifeHighDate> </element> <element type='com.stock.ptl.info.content.quotes.IndexQuote'> <Description type='java.lang.String'>S&amp;P 500 INDEX</Descriptio +n> <Symbol type='java.lang.String'>^SPX.NaE</Symbol> <NetChange type='float'>0.69</NetChange> <Currency type='java.lang.String'>US dollar</Currency> <SecurityType type='java.lang.String'>INDX</SecurityType> <PercentChange type='float'>0.08</PercentChange> <LastTradePrice type='float'>910.52</LastTradePrice> <LastTradeDate type='java.util.Date'>2002-11-18 07:01:00 CST</Last +TradeDate> <LastTradeTime type='java.lang.String'>19:01</LastTradeTime> <PreviousClose type='float'>909.83</PreviousClose> <MarketSystem type='java.lang.String'>Chicago Board Options Exchan +ge</MarketSystem> <Volume type='int'>0</Volume> <BackgroundSymbol type='java.lang.String'></BackgroundSymbol> <Week52High type='float'>0.0</Week52High> <Week52Low type='float'>0.0</Week52Low> <DayHigh type='float'>0.0</DayHigh> <DayLow type='float'>0.0</DayLow> <PreviousCloseDate type='null'></PreviousCloseDate> <OpenPrice type='float'>0.0</OpenPrice> <PrevYearHigh type='float'>0.0</PrevYearHigh> <PrevYearLow type='float'>0.0</PrevYearLow> <LifeHigh type='float'>0.0</LifeHigh> <LifeLow type='float'>0.0</LifeLow> <PriorYearClose type='float'>0.0</PriorYearClose> <LifeLowDate type='null'></LifeLowDate> <LifeHighDate type='null'></LifeHighDate> </element> <element type='com.stock.ptl.info.content.quotes.IndexQuote'> <Description type='java.lang.String'>RUSSELL 2000 IND</Description +> <Symbol type='java.lang.String'>^RUT.NaE</Symbol> <NetChange type='float'>1.48</NetChange> <Currency type='java.lang.String'>US dollar</Currency> <SecurityType type='java.lang.String'>INDX</SecurityType> <PercentChange type='float'>0.38</PercentChange> <LastTradePrice type='float'>387.4</LastTradePrice> <LastTradeDate type='java.util.Date'>2002-11-18 07:01:00 CST</Last +TradeDate> <LastTradeTime type='java.lang.String'>19:01</LastTradeTime> <PreviousClose type='float'>385.92</PreviousClose> <MarketSystem type='java.lang.String'>Chicago Board Options Exchan +ge</MarketSystem> <Volume type='int'>0</Volume> <BackgroundSymbol type='java.lang.String'></BackgroundSymbol> <Week52High type='float'>0.0</Week52High> <Week52Low type='float'>0.0</Week52Low> <DayHigh type='float'>0.0</DayHigh> <DayLow type='float'>0.0</DayLow> <PreviousCloseDate type='null'></PreviousCloseDate> <OpenPrice type='float'>0.0</OpenPrice> <PrevYearHigh type='float'>0.0</PrevYearHigh> <PrevYearLow type='float'>0.0</PrevYearLow> <LifeHigh type='float'>0.0</LifeHigh> <LifeLow type='float'>0.0</LifeLow> <PriorYearClose type='float'>0.0</PriorYearClose> <LifeLowDate type='null'></LifeLowDate> <LifeHighDate type='null'></LifeHighDate> </element> <element type='com.stock.ptl.info.content.quotes.IndexQuote'> <Description type='java.lang.String'>CI-EU/AUS/F EAST</Description +> <Symbol type='java.lang.String'>^CEAF.NaE</Symbol> <NetChange type='float'>1.24</NetChange> <Currency type='java.lang.String'>Not allocated</Currency> <SecurityType type='java.lang.String'>INDX</SecurityType> <PercentChange type='float'>0.2</PercentChange> <LastTradePrice type='float'>620.239</LastTradePrice> <LastTradeDate type='java.util.Date'>2002-11-18 06:24:00 CST</Last +TradeDate> <LastTradeTime type='java.lang.String'>18:24</LastTradeTime> <PreviousClose type='float'>618.999</PreviousClose> <MarketSystem type='java.lang.String'>World Indices</MarketSystem> <Volume type='int'>0</Volume> <BackgroundSymbol type='java.lang.String'></BackgroundSymbol> <Week52High type='float'>0.0</Week52High> <Week52Low type='float'>0.0</Week52Low> <DayHigh type='float'>0.0</DayHigh> <DayLow type='float'>0.0</DayLow> <PreviousCloseDate type='null'></PreviousCloseDate> <OpenPrice type='float'>0.0</OpenPrice> <PrevYearHigh type='float'>0.0</PrevYearHigh> <PrevYearLow type='float'>0.0</PrevYearLow> <LifeHigh type='float'>0.0</LifeHigh> <LifeLow type='float'>0.0</LifeLow> <PriorYearClose type='float'>0.0</PriorYearClose> <LifeLowDate type='null'></LifeLowDate> <LifeHighDate type='null'></LifeHighDate> </element> </quotes> <readCount type='int'>6</readCount> <passedCount type='int'>0</passedCount> <paramCount type='int'>6</paramCount> </quotelist>

Replies are listed 'Best First'.
Re: Parsing XML???
by gjb (Vicar) on Nov 18, 2002 at 21:24 UTC

    First of all, you'll want to change your XML file since it is not valid: the second line lacks a DTD file name. The fourth and fifth line are not valid either. See the XML 1.0 specs for details. For simplicities sake I changed the document declaration to:

    <?xml version="1.0"?>
    leaving out the other lines.

    The following code will parse the file and print the data you wish to extract.

    #!perl use strict; use warnings; use XML::Parser; my $parser = new XML::Parser(Handlers => {'Start' => \&startHandler, 'Char' => \&charHandler, 'End' => \&endHandler}); my $mode = undef; $parser->parsefile('data.xml'); sub startHandler { my $self = shift(); my $element = shift(); my %attributes = @_; if ($element eq 'LastTradePrice') { $mode = 'LastTradePrice'; } elsif ($element eq 'NetChange') { $mode = 'NetChange'; } elsif ($element eq 'LastTradeDate') { $mode = 'LastTradeDate'; } } sub charHandler { my $self = shift(); my $str = shift(); if (defined $mode) { print "$mode: $str\n"; } } sub endHandler { my $self = shift(); $mode = undef; }

    Hope this helps, -gjb-

Re: Parsing XML???
by artist (Parson) on Nov 18, 2002 at 20:11 UTC
    Hi

    Depend upon your work requirement, you can write a simple perl program or use XML Modules. To Being with, have a look at XML::Simple.

    Artist

      Hello,
      I probably should have stated this upfront, but the server this has to run on doesn't have XML::Simple, so I have to use XML::Parser. I'd love to just install XML::simple, but I don't have access to the perl library to add this mod, I can only use what they have. I've been looking at the perldoc for XML::Parser all day so far and trying things but can't get it to work. This should be something simple, but I can't figure out why it isn't working.

      Thanks
        Well, rather than reinventing the wheel, since XML::Simple is written totally in perl, you can just include it by adjusting your @INC to include it, rather than installing it into the normal search path.

        Failing that, you should be able to parse similarly to XML::Simple using XML::Parser's neat Tree style.
        use XML::Parser; $p1 = new XML::Parser(Style=> 'Tree'); my $xml = $p1->parsefile('foo.xml'); use Data::Dumper; print Dumper($xml);

        Granted, though, I haven't checked this code yet.. ;-)
Re: Parsing XML???
by mirod (Canon) on Nov 18, 2002 at 23:19 UTC

    Of course I had to come up with an XML::Twig version!

    Oh, and yes, you can use it even if it is not installed by the admin of the machine, see Using Modules in a local directory, or have a look at the FAQ under "module" (perldoc -q module):

    #!/usr/bin/perl -w use strict; use XML::Twig; my $t= XML::Twig->new( twig_roots => { element => sub { my %info; $info{last_tra +de_price} = $_->field( 'LastTradePrice'); $info{net_chan +ge} = $_->field( 'NetChange' ); $info{last_tra +de_date} = $_->field( 'LastTradeDate' ); print join( ' +- ', %info), "\n"; # or do something useful }, } )->parse( $xml);
Re: Parsing XML???
by pg (Canon) on Nov 18, 2002 at 20:35 UTC
    Other than those modules already mentioned by fellow monks, you also can find other modules, for SAX, and DOM.

    After all, there are two types of modules, either SAX like, which allow you to handle XML content on fly, but does not keep history in memory (Well, you can do it for yourself, but I mean the module does not do it for you); or DOM like, which grab the whole XML, and save in some data structure for you to handle. Depends on your need, you can choose one type.
Re: Parsing XML???
by dingus (Friar) on Nov 19, 2002 at 09:37 UTC
    The quick and dirty regex way.

    DO NOT DO THIS WITHOUT A VERY GOOD REASON
    XML::Simple and/or XML::Twig are much much better because they make no assumptions about the order of the elements nor about the possibility that the element may change (e.g. the type='' part could at some point be a different type).

    for (split(m!</element>!, $xml)) { next unless m!<NetChange type='float'>([^<]*)<.*<LastTradePrice type='float'>( +[^<]*)<.*<LastTradeDate type='java.util.Date'>([^<]*)<!s; print "Net: $1, Price $2, Date $3\n"; } __output__ Net: 9.06, Price 1420.2, Date 2002-11-18 07:01:00 CST Net: 0.0, Price 7850.29, Date 2002-10-11 10:07:00 CDT Net: 0.69, Price 910.52, Date 2002-11-18 07:01:00 CST Net: 1.48, Price 387.4, Date 2002-11-18 07:01:00 CST Net: 1.24, Price 620.239, Date 2002-11-18 06:24:00 CST
    Astute perl programmers will realise that it would be possible to deal with changes to the tag names and to their order, which are the limitations I mentioned above. But when perfectly good modules such as XML::Simple and XML::Twig exist then you should use them instead because the complex regex will be far less easy to maintain.

    Dingus


    Enter any 47-digit prime number to continue.
Re: Parsing XML???
by nandeya (Monk) on Nov 19, 2002 at 19:48 UTC
    Here is another quick way to skin the cat using mirod's XML::Twig (I threw Symbol in there too).
    #!perl -w use strict; use XML::Twig; my $xmlfile = "D:/JUNK/quotelist.xml"; my ($symbol, $lasttradeprice, $netchange, $lasttradedate); my $twig= new XML::Twig( TwigHandlers => { 'Symbol' => \&get_symbol, 'LastTradePrice' => \&get_lasttradeprice, 'NetChange' => \&get_netchange, 'LastTradeDate ' => \&get_lasttradedate, }); $twig->parsefile($xmlfile); sub get_symbol { my( $twig, $v)= @_; $symbol = $v->text('name'); } sub get_lasttradeprice { my( $twig, $v)= @_; $lasttradeprice = $v->text('name'); } sub get_netchange { my( $twig, $v)= @_; $netchange = $v->text('name'); } sub get_lasttradedate { my( $twig, $v)= @_; $lasttradedate = $v->text('name'); &print_em; } sub print_em { print "$symbol $lasttradeprice $netchange $lasttradedate\n"; }

    nandeya