Re^7: Extracting span and meta content with HTML::TreeBuilder

See all the links here Re: Retrieve select information from HTML, they're examples(for tree-xpath and others)/walkthroughs/tutorials ... tools like xpather.pl/htmltreexpather.pl can give you paths to start with

findnodes gives you nodes ... or in case of treebuilder it gives HTML::Element object you can call methods on ... the other player gives XML::LibXML::Node be they XML::LibXML::Element or something else (libxml follows the DOM closely)

This tutorial needs javascript http://zvon.org/comp/r/tut-XPath_1.html

On the file you provided xpather spits out stuff like this

/html/body/div/div/span

# posy
/html[1]/body[1]/div[1]/div[1]/span[1]

# star
/*[ local-name() = "html"
    and position() = 1
  ]

/*[ local-name() = "body"
    and position() = 1
  ]

/*[ local-name() = "div"
    and position() = 1
    and @class = "review-content"
  ]

/*[ local-name() = "div"
    and position() = 1
    and @class = "biz-rating biz-rating-very-large clearfix"
  ]

/*[ local-name() = "span"
    and @class = "rating-qualifier"
    and contains(string(), "                1/13/2011     ")
  ]


# rats
/html[1]
 /body[1]
 /*[ name() = "div" and position() = 1 and @class = "review-content" ]
 /*[ name() = "div" and position() = 1 and @class = "biz-rating biz-ra
+ting-very-large clearfix" ]
 /*[ name() = "span" and position() = 1 and @class = "rating-qualifier
+" ]
[download]

Its a tree :) so //meta means find a <meta> anywhere where as /foo/meta means find every child meta of root element foo <foo><meta></meta>....</foo>

The examples/tuts give more better examples and explanations

Comment on Re^7: Extracting span and meta content with HTML::TreeBuilder Select or Download Code