in reply to Re: extracting data from HTML
in thread extracting data from HTML

reading up on HTML::Selector::Xpath, I try to understand from it that it's sole purpose is to translate from CSS to XPath expression. Correct me if I'm wrong.

However, it doesn't seem to be capable to do what is needed to solve the problem mentioned in Re^5: extracting data from HTML where the parser seemed to have provided each and everynode with a default namespace

Wouldn't it be great to have HTML::Selector::Xpath have the possibillity to have each and every element to include a user definable 'default' namespace prefix? - but only those elements that do not have a namespace by themselves ofcourse

If you aske me, it can't be too difficult to implement that, is it?

Replies are listed 'Best First'.
Re^3: extracting data from HTML
by Corion (Patriarch) on Jun 06, 2012 at 21:04 UTC
    If I understand you right, the (undocumented) "prefix" option already does that.

      Indeed - it was me who requested this feature, and supplied the patch. Mostly in order to support XML::LibXML::QuerySelector, which extends XML::LibXML to support CSS selectors...

      my @important_paragraphs = $xmlnode->querySelectorAll('body p.important');

      XML::LibXML::QuerySelector passes the selector on to Corion's module, which returns it an XPath. It then queries XML::LibXML for the XPath, then passes the list of the results through a "descendent of" function to make sure that all returned elements are children of the original $xmlnode.

      TL;DR: XML::LibXML::QuerySelector implements W3C Selectors API Level 1 for XML::LibXML.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'