Hello dear anonymous_monk! I am triying to understand your posting!


you refer to the page that explains and hepls finding xpaths. That is very very interesting! I am trying to learn something here.

you use this link: http://www.perlmonks.org/?node_id=865792

It leads to this code!

this is a great great totuorial and a supergreat tool: Lemme ask yo +u if i got this right!? With that i can determine the paths - in ot +her words i can find out all the paths in a HTML-file!? $ perl htmltreexpather.pl select.html _tag option HTML::Element=HASH(0xb139ec) 0.1.1.0.0 Chose Some aaa /html/body/form/select/option /html/body/form/select/option /html/body[@bgcolor='red']/form[@action='/foo.cgi' and @name='queryfoo +']/select[@name='singlelist']/option[@value='aaa'] ------------------------------------------------------------------



Question: this above mentioned code helps to throw out the Paths of a (general) HTML-document!?!?

At least you make usage here:

#!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder::XPath; #~ $XML::XPathEngine::DEBUG = 1; my $tree = HTML::TreeBuilder::XPath->new; $tree->parse_content(<<'__HTML__'); <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http:// www.w3.org/TR/html4/loose.dtd"><html><head><meta name="generator" con tent="DigiOnline GmbH - WebWeaver 3.4 CMS - http://www.webweaver.de"> <title>educa.ch</title><meta http-equiv="Content-Type" content="text/ html; charset=iso-8859-1"><link rel="stylesheet" href="101.htm"><scri pt src="102.htm"></script><script language="JavaScript"><!-- var did='d79376'; var root=new Array('d200','d205','d73137','d1566','d79376','d'); var usefocus = 1; function check() { if ((self.focus) && (usefocus)) { self.focus(); } } // --></script></head><body bgcolor="#FFFFFF" leftmargin="0" topmargin ="0" marginwidth="0" marginheight="0" onload="check();"><table cellsp acing="0" cellpadding="0" border="0" width="100%"><tr><td width="15" class="popuphead"><img src="/0.gif" alt="" width="15" height="16"></t d><td width="99%" class="popuphead">Adresse - Schulen in der Schweiz< /td><td width="20" class="popuphead" valign="middle"><a href="#" titl e="Print" onclick="window.print(); return false;"><img src="../pics/p rint16x13.gif" alt="Drucken" width="16" height="13"></a></td><td widt h="20" class="popuphead" valign="middle"><a href="#" title="close" on click="window.close(); return false;"><img src="../pics/close21x13.gi f" alt="Schliessen" width="21" height="13"></a></td></tr> <tr bgcolor="#B2B2B2"><td colspan="4"><img src="/0.gif" alt="" width=" 1" height="1"></td></tr></table><div class="leerzeile"> </div><d iv class="leerzeile"><img src="/0.gif" alt="" width="15"height="8">Al tes Schulhaus Ossingen </div><div class="leerzeile"> </div><d iv><img src="/0.gif" alt="" width="15" height="8">Guntibachstrasse 10 </div><div><img src="/0.gif" alt="" width="15" height="8"></div><div> <img src="/0.gif" alt="" width="15" height="8">8475 Ossingen</d iv><div class="leerzeile"> </div><div><img src="/0.gif" alt="" w idth="15" height="8"><a href="" target="_blank"></a></div><div><img s rc="/0.gif" alt="" width="15" height="8"><a href="mailto: sekretariat .psossingen@bluewin.ch">sekretariat.psossingen@bluewin.ch</a></div><d iv class="leerzeile"> </div><div><img src="/0.gif" alt="" width= "15" height="8">Tel:<img src="/0.gif" alt="" width="6" height="8">052 317 15 45 </div><div><img src="/0.gif" alt="" width="15" height="8"> Fax:<img src="/0.gif" alt="" width="4" height="8">052 317 04 42 </div ><div> </div></body></html> __HTML__ # you can delete html/body for my $query ( qw! /html/body/div[2] /html/body/div[4] /html/body/div[6] /html/body/div[9] /html/body/div[11] /html/body/div[12] ! ) { print $query,"\n",$tree->findvalue($query),"\n\n"; } __END__ /html/body/div[2] Altes Schulhaus Ossingen /html/body/div[4] Guntibachstrasse 10 /html/body/div[6] 8475 Ossingen /html/body/div[9] sekretariat.psossingen@bluewin.ch /html/body/div[11] Tel:052 317 15 45 /html/body/div[12] Fax:052 317 04 42



That is very very impressive. I try to understand this code - and your usage of your example -that you were refering to!


$ perl htmltreexpather.pl select.html _tag option HTML::Element=HASH(0xb139ec) 0.1.1.0.0 Chose Some aaa /html/body/form/select/option /html/body/form/select/option /html/body[@bgcolor='red']/form[@action='/foo.cgi' and @name='queryfoo +']/select[@name='singlelist']/option[@value='aaa']


if i get you right - then i can use this script for many many cases - in order to get out the Xpaths!? Is this right

look forwward to hear form you! I guess that i can learn alot! Plz help me here!


In reply to Re^3: HTML::TreeBuilder:: identifing xpath-expression - first attempt by Perlbeginner1
in thread HTML::TreeBuilder:: identifing xpath-expression - first attempt by Perlbeginner1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.