rduke15 has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm trying to find nodes where the class name matches a regular expression, but I cannot figure out the right syntax.

I'm using HTML::TreeBuilder::XPath, and also trying to explore with the XPather Firefox extension.

The w3.org documentation mentions a matches() function, but whatever I tried has failed.

In practice, I'm looking for li nodes with class="mw-line-even" or "mw-line-odd". (yes I could just use "or", but I would like to understand how to use regexes in such a case)

Would someone know the correct syntax for something like this:

$tree->findnodes( '//li[matches(@class, "mw-line-.*")]' );

The more perlish version of what I mean could be

$tree->findnodes( '//li[ @class =~ /^mw-line-/ ]' );

Replies are listed 'Best First'.
Re: HTML::TreeBuilder::XPath and regular expressions
by rduke15 (Beadle) on Mar 29, 2010 at 12:17 UTC

    Shame on me! It turns out the simplest Perlier version actually works!

    It may not be official w3.org syntax, and doesn't work in XPather, but it does in HTML::TreeBuilder::XPath.

    So in case other people search for this... to find attribute values macthing a regular expression using HTML::TreeBuilder::XPath :

      $tree->findnodes( '//element[ @attribute =~ /regex/ ]' );

      Yes, I cheated. In XML::XPathEngine, which provides the XPath engine for HTML::TreeBuilder::XPath, it was easy and seemed like the Perlish thing to do to integrate regexps in the XPath syntax itself.

      From looking at the code (which I inherited from XML::XPath), the official XPath way, using fn:matches(subject, pattern, flags) doesn't seem to be supported. Patches welcome.

        Well, if we can use the much simpler and better Perl way, who cares about the convoluted "official" way... :-)

        As it is, it's great. Thanks a lot. The only missing thing seems to be a sentence and/or an example in the module's documentation. In the meantime, hopefully this thread will show up for the kind of searches I made without success today.