Yeah, I guess if XML::Simple must load the entire data structure into memory, I don't think it's quite what I'm looking for. It just takes too long.
Maybe an example might help. Suppose I have the following XML input file:
<?xml version="1.0"?>
<library>
<book>
<title>Dreamcatcher</title>
<author>Stephen King</author>
<genre>Horror</genre>
<pages>899</pages>
<price>23.99</price>
<rating>5</rating>
<publication_date>11/27/2001</publication_date>
</book>
<book>
<title>Mystic River</title>
<author>Dennis Lehane</author>
<genre>Thriller</genre>
<pages>390</pages>
<price>17.49</price>
<rating>4</rating>
<publication_date>07/22/2003</publication_date>
</book>
<book>
<title>The Lord Of The Rings</title>
<author>J. R. R. Tolkien</author>
<genre>Fantasy</genre>
<pages>3489</pages>
<price>10.99</price>
<rating>5</rating>
<publication_date>10/12/2005</publication_date>
</book>
</library>
Suppose I only want to import books that were published after January 1, 2002. If I apply such a filter when I do my initial import, the result should look like this:
$VAR1 = {
'book' => [
{
'publication_date' => '07/22/2003',
'price' => '17.49',
'author' => 'Dennis Lehane',
'title' => 'Mystic River',
'rating' => '4',
'pages' => '390',
'genre' => 'Thriller'
},
{
'publication_date' => '10/12/2005',
'price' => '10.99',
'author' => 'J. R. R. Tolkien',
'title' => 'The Lord Of The Rings',
'rating' => '5',
'pages' => '3489',
'genre' => 'Fantasy'
}
]
};
The import will completely ignore entries that don't meet the specified criteria (in this case, publication_date must be >= '1/1/2002'). Can DOM or SAX-based parsing do this?
|