in reply to XML::LibXML getElementsById problem

There is nothing magical about an attribute named 'id'. You have to tell the system that it is of type... 'ID', either by using a DTD (you could probably also use a RelaxNG schema), or by using 'xml:id', which IS magical, instead of just 'id'.

  • Comment on Re: XML::LibXML getElementsById problem

Replies are listed 'Best First'.
Re^2: XML::LibXML getElementsById problem
by pmc (Initiate) on Dec 15, 2005 at 16:25 UTC
    Thanks for the tip. This snippet works! I'm using XML::LibXML to parse HTML docs. Unfortuantely it does not treat HTML ids like xml:id. I'm pretty new to XML. Thanks again.
    use strict; use XML::LibXML; my $xml_string = <<EOF; <?xml version="1.0"?> <root> <aaa xml:id='test'> <bbb/> </aaa> </root> EOF my $parser = XML::LibXML->new(); my $doc = $parser->parse_string($xml_string) || die; my $elem = $doc->getElementsById('test'); print STDERR $elem."\n";

      Why don't you use a regular XPath expression instead of getElementsById? my $elem = ($doc->findnodes('//*[@id="test"]'))[0]; works fine. It is probably slower than using getElementsById but it might not matter. Or you could select all elements with the attribute id and replace it by xml:id, and hope (I would think it works) that getElementsById then works. Or you could pre-process your HTML using tidy for example to get XHTML, and then use XML::LibXML on the XHTML (you might need to set the option to process the DTD in order for id to be recognized as an ID).

      There might also be an XML::LibXML specific trick for this, but I don't know the module that well.