in reply to Data mining

Looks good to me. (don't forget the closing sites tag.)

A small consideration, however, is this:

<site> <name>Ralinga</name> <url>http://www.ralinga.lt/</url> ... </site>
versus this:
<site name="Ralinga" url="http://www.ralinga.lt/"> ... </site>
It really is only a matter of personal choice, though. I find it very helpful to shove my XML through XML::Simple (with and without forcearray on) and then shove the resulting data structure through Data::Dumper to see the differences.

UPDATE:
After some reflection, i am curious why you didn't break this line up:

<extract type="info">TD class=kain width=500</extract> # which would yield a data structure similar to 'extract' => [ { 'content' => 'TD class=kain width=500', 'type' => 'info' } ]
Wrapping the content in a simple tag would do the job of breaking up the atoms for you:
<extract type="info"><TD class="kain" width="500"/></extract> # yields something like: 'extract' => [ { 'TD' => { 'class' => 'kain', 'width' => '500', }, 'type' => 'info' } ]
Let the XML parser do the work. ;)

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)