comment on

Hello XML fans, it's time to do some Prolog-like search and query on a small XML database. What is shown below is an adjacency map. It is an XML document which shows which cities are next to which other cities. The utility of such a document/data structure can be imagined to be if a person had an inter-city travel ticket and wanted to look up which cities were next to his. Which cities were two away, etc, etc.

While you could use normal nested Perl data structures to deal with this, XML is becoming en vogue and as a result we have to be just as fashionable. Actually, this isn't true, we can always use Gisle Aas' Data::XMLDumper to convert XML to-and-for Perl nested data structures. But for the purpose of this tutorial, we will act like that module doesn't exist.

So without further adieu, I present the XML document detailing the (far too windy) part of the world I currently live in (and will be escaping from as soon as Christmas is here):

<border_list>

<pair><city>mountain view</city><city>sunnyvale</city></pair>
<pair><city>mountain view</city><city>palo alto</city></pair>
<pair><city>menlo park</city><city>palo alto</city></pair>
<pair><city>atherton</city><city>menlo park</city></pair>
<pair><city>atherton</city><city>redwood city</city></pair>
<pair><city>san carlos</city><city>redwood city</city></pair>
<pair><city>san carlos</city><city>belmont</city></pair>
<pair><city>hillesdale</city><city>belmont</city></pair>
<pair><city>hillesdale</city><city>san mateo</city></pair>

</border_list>
[download]

Ok, so now what

So, now that I have shown the data, it is time to grok it, munge it, eat it for breakfast as a meal replacement and basically put it at it's knees to do our bidding.

Program One: find all cities next to menlo park

Ok, here is a program to grok this XML-base for all cities next menlo park:

use XML::Twig;

my $t = XML::Twig->new(PrettyPrint => 'record');

$t->parsefile('adj.xml');
my $root = $t->root;

# @pair has all the pairs of adjacent cities in it
my @pair = $root->children;

# target city we are looking for
my $city = 'menlo park';

# this routine takes a search text and a list of XML elements and
# searches them for the text
sub candidate_generator {
    my ($search_text, @data) = @_;
    grep { grep { $_->text eq $search_text } $_->children  } @data;
}

# take the entire XML-base and search for records which have our
# target city in them
my @adj = candidate_generator($city,@pair);

# print them out in a human-readable form
map { $_->print } @adj;
[download]

and here is the pretty output:

  <pair>
    <city>menlo park</city>
    <city>palo alto</city>
  </pair>
  <pair>
    <city>atherton</city>
    <city>menlo park</city>
  </pair>
[download]

all done

The program was documented, so it should make sense, but let's take a closer look at candidate_generator().

# this routine takes a search text and a list of XML elements and
# searches them for the text
sub candidate_generator {
    my ($search_text, @data) = @_;
    grep { grep { $_->text eq $search_text } $_->children  } @data;
}
[download]

It consists of two nested greps and hence can be a little confusing. Depending on the way you think you might want to think about the outer grep and then the inner grep or vice versa. It is only fitting that I discuss both methods of program comprehension.

Let's do top-down first. The outer grep is basically saying: take all the XML records and only return the ones which satisfy the inner search criteria. The inner search criteria takes each individual XML record and looks at each of it's children, where each child is a city and examines its text for equality with the text to be searched for, or concretely speaking menlo park.

Ok, now bottom up. The innermost expression is $_->text eq $search_text and what this does is take an XML element and get its text and compare it to a normal Perl string. So if $elt was an XML::Twig::Elt representing

<city>boise</city>
[download]

then $elt->text would be boise. Now we work out a bit more. And a bit more out is grep { YADAYADA } $_->children So here we take advantage of the fact that the XML is structured so that neighboring cities are both children of the pair element, e..g:

<pair><city>mountain view</city><city>sunnyvale</city></pair>
[download]

and we are just checking to see if either child is the text we are looking for. And now we finally make it to the outer grep and the first sentence in the top-down description says what that is doing.

th-th-th-that's the first post, folks

Anyway, that was the first in a series of 3 posts. The next two will do slightly more advanced searching and in the process introduce a call or two more from the XML::Twig API.

In reply to Adjacency List Processing in XML::Twig by princepawn

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


The stupid question is the question not asked
	PerlMonks