First of all, I'll let you into a dirty little secret. 99.99% of the time, you can quite easily parse XML files with regular expression. This is because 99.99% of the time you deal with only one external party sending you XML files, and they don't code it by hand, they wrote a program to generate it.

And the thing is, they don't modify the program once it's in production, or rarely or deeply enough for it matter to you. This means that once you have figured out what the file looks like by empirical observation, you can write a few short patterns to pull out what you need.

You really need to parse XML files when you have written the spec, and many people are sending you their data based on your spec. But I digress.

When you say you want the contents of NAME and AGE elements, you probably have more context lying around in the file. Such as a PERSON element that encompasses them, otherwise you might get confused by <tree><age>437</age><name>Sequoia</name></tree> elements. To disambiguate this, you want the NAME element within the PERSON element, along with the AGE element of the PERSON element.

Furthermore, you don't know if you'll see the NAME element first, or the AGE element first. That is, you might have <person><age>56</age><name>Alice</name></person> or <name>Bill</name><age>28</age>. So what you do is you keep track of each one you find, in a hash, and after you find another element, you check to see if you have both of them, and if so you do something with them.

The following code uses XML::Twig to implement the above algorithm. I haven't tested to see whether it compiles, but suc minor details will be cleaned up by the Chatterbox crew if you care to ask them :)

use strict; use warnings; use XML::Twig; my $twig = do { my %seen; XML::Twig->new( twig_handlers => { 'PERSON/NAME' => sub { my ($t, $e) = @_; $seen{NAME} = $e->text; check(\%seen); }, 'PERSON/AGE' => sub { my ($t, $e) = @_; $seen{AGE} = $e->text; check(\%seen); } } ) }; sub check { my $person = shift; return unless keys %$person == 2; print "$person->{NAME} is $person->{AGE} years old.\n"; %$person = (); } for my $file (@ARGV) { $twig->parsefile($file); }

• another intruder with the mooring in the heart of the Perl


In reply to Re: I want to find a group of pattern in a xml file by grinder
in thread I want to find a group of pattern in a xml file by cybär

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.