Hello dear anonymous_monk! I am triying to understand your posting!
you refer to the page that explains and hepls finding xpaths. That is very very interesting! I am trying to learn something here.
you use this link: http://www.perlmonks.org/?node_id=865792
It leads to this code!
this is a great great totuorial and a supergreat tool: Lemme ask yo
+u if i got this right!? With that i can determine the paths - in ot
+her words
i can find out all the paths in a HTML-file!?
$ perl htmltreexpather.pl select.html _tag option
HTML::Element=HASH(0xb139ec) 0.1.1.0.0
Chose Some aaa
/html/body/form/select/option
/html/body/form/select/option
/html/body[@bgcolor='red']/form[@action='/foo.cgi' and @name='queryfoo
+']/select[@name='singlelist']/option[@value='aaa']
------------------------------------------------------------------
Question: this above mentioned code helps to throw out the Paths of a (general) HTML-document!?!?
At least you make usage here:
#!/usr/bin/perl --
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
#~ $XML::XPathEngine::DEBUG = 1;
my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse_content(<<'__HTML__');
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://
www.w3.org/TR/html4/loose.dtd"><html><head><meta name="generator" con
tent="DigiOnline GmbH - WebWeaver 3.4 CMS - http://www.webweaver.de">
<title>educa.ch</title><meta http-equiv="Content-Type" content="text/
html; charset=iso-8859-1"><link rel="stylesheet" href="101.htm"><scri
pt src="102.htm"></script><script language="JavaScript"><!--
var did='d79376';
var root=new Array('d200','d205','d73137','d1566','d79376','d');
var usefocus = 1;
function check() {
if ((self.focus) && (usefocus)) {
self.focus();
}
}
// --></script></head><body bgcolor="#FFFFFF" leftmargin="0" topmargin
="0" marginwidth="0" marginheight="0" onload="check();"><table cellsp
acing="0" cellpadding="0" border="0" width="100%"><tr><td width="15"
class="popuphead"><img src="/0.gif" alt="" width="15" height="16"></t
d><td width="99%" class="popuphead">Adresse - Schulen in der Schweiz<
/td><td width="20" class="popuphead" valign="middle"><a href="#" titl
e="Print" onclick="window.print(); return false;"><img src="../pics/p
rint16x13.gif" alt="Drucken" width="16" height="13"></a></td><td widt
h="20" class="popuphead" valign="middle"><a href="#" title="close" on
click="window.close(); return false;"><img src="../pics/close21x13.gi
f" alt="Schliessen" width="21" height="13"></a></td></tr>
<tr bgcolor="#B2B2B2"><td colspan="4"><img src="/0.gif" alt="" width="
1" height="1"></td></tr></table><div class="leerzeile"> </div><d
iv class="leerzeile"><img src="/0.gif" alt="" width="15"height="8">Al
tes Schulhaus Ossingen </div><div class="leerzeile"> </div><d
iv><img src="/0.gif" alt="" width="15" height="8">Guntibachstrasse 10
</div><div><img src="/0.gif" alt="" width="15" height="8"></div><div>
<img src="/0.gif" alt="" width="15" height="8">8475 Ossingen</d
iv><div class="leerzeile"> </div><div><img src="/0.gif" alt="" w
idth="15" height="8"><a href="" target="_blank"></a></div><div><img s
rc="/0.gif" alt="" width="15" height="8"><a href="mailto: sekretariat
.psossingen@bluewin.ch">sekretariat.psossingen@bluewin.ch</a></div><d
iv class="leerzeile"> </div><div><img src="/0.gif" alt="" width=
"15" height="8">Tel:<img src="/0.gif" alt="" width="6" height="8">052
317 15 45 </div><div><img src="/0.gif" alt="" width="15" height="8">
Fax:<img src="/0.gif" alt="" width="4" height="8">052 317 04 42 </div
><div> </div></body></html>
__HTML__
# you can delete html/body
for my $query (
qw!
/html/body/div[2]
/html/body/div[4]
/html/body/div[6]
/html/body/div[9]
/html/body/div[11]
/html/body/div[12]
!
)
{
print $query,"\n",$tree->findvalue($query),"\n\n";
}
__END__
/html/body/div[2]
Altes Schulhaus Ossingen
/html/body/div[4]
Guntibachstrasse 10
/html/body/div[6]
8475 Ossingen
/html/body/div[9]
sekretariat.psossingen@bluewin.ch
/html/body/div[11]
Tel:052 317 15 45
/html/body/div[12]
Fax:052 317 04 42
That is very very impressive. I try to understand this code - and your usage of your example -that you were refering to!
$ perl htmltreexpather.pl select.html _tag option
HTML::Element=HASH(0xb139ec) 0.1.1.0.0
Chose Some aaa
/html/body/form/select/option
/html/body/form/select/option
/html/body[@bgcolor='red']/form[@action='/foo.cgi' and @name='queryfoo
+']/select[@name='singlelist']/option[@value='aaa']
if i get you right - then i can use this script for many many cases - in order to get out the Xpaths!? Is this right
look forwward to hear form you! I guess that i can learn alot! Plz help me here!
|