LibXML - Removing Namespace?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm pretty sure this is not possible, but I thought I'd ask the experts to make sure.

I'm working with a large, very deep set of XML data, and mostly my perl script needs to read in the XML - parse the data, do a few modifications/alterations/aggregations, then dump to a DB.

The source XML has a xmls tag defined in the root, which is completely irrelevant to what I am doing with it. As we all know, LibXML becomes a serious pain to work with when namespaces are defined like this, and since I am traveling to 8 or 9 levels of child nodes multiple times, and doing operations on each, I fear for my sanity having to redefine and declare the XPathContext.

So my question is simple, is there any way (barring a sed on the source to remove it before parsing..) to remove namespace from the LibXML parsed object? There seem to be plenty of ways to define new ones, and I haven't seen a definitive answer yet anywhere on this one.

Your thoughts are appreciated.

Comment on LibXML - Removing Namespace?

Replies are listed 'Best First'.
Re: LibXML - Removing Namespace? by derby (Abbot) on Apr 11, 2008 at 17:33 UTC
Why don't you just set the namespace `$root->setNamespace( 'http://a9.com/-/spec/opensearch/1.1/', 'openSearch' );` [download] Or use it in the XPATH `my $start = $root->findvalue( 'openSearch:startIndex/text()' );` [download] -derby	[reply] [d/l] [select]
Re^2: LibXML - Removing Namespace? by Anonymous Monk on Apr 11, 2008 at 20:43 UTC
First all, thanks, those are great suggestions, I didn't know about the setNamespace trick for defining context, however I still have to redefine it for every level as I progress through the xml right? Perhaps I'm not being clear. Incomming example: `<root xmls='urn:foo'> <first_sub name='foo'> <second_sub id='1'> <third_sub>Foo</third_sub> </second_sub> <second_sub id='2'> </second_sub> </first_sub> </root>` [download] Now just imagine that there are multiple first_sub, second_sub, and third_sub elements nested in the example above. In order to get at all the values I want, as far as I can tell I have to do something like this(assuming $xml is set to the above): my $parser = XML::LibXML->new(); my $data = $parser->parse_string ( $xml ); $data->setNamespace ( 'urn:foo', 'x' ); for my $first_sub ( $data->findnodes ('/x:root/x:first_sub')) { my $name = $first_sub->getAttribute('name'); $first_sub->setNamespace ('urn:foo','x'); for my $second_sub ( $first_sub->findnodes ('./x:second_sub')) { my $id = $second_sub->getAttribute('id'); $second_sub->setNamespace ('urn:foo','x'); for my $third_sub ( $second_sub->findnodes('./x:third_sub')) { # do something with the values } } } [download] So I'm pretty sure its confirmed that I can't remove the name space to avoid all this setting/defining and extra work in the XPath expressions, which was my original question, this is fine, I'll just stick to something like the above. thanks again.	[reply] [d/l] [select]
Re^3: LibXML - Removing Namespace? by Anonymous Monk on Apr 11, 2008 at 20:58 UTC
whops, that code is missing a getDocumentElement...should really register so I can edit my posts ;) pretend there is: $data->getDocumentElement; under the parse_string.	[reply]
Re: LibXML - Removing Namespace? by Your Mother (Archbishop) on Apr 11, 2008 at 18:06 UTC
Misery loves company. Found this snippet in LJ::Feed. (edit, fixed link) `# Strip namespace from child tags. Set default namespace, let # child tags inherit from it. So ghetto that we even have to do t +his # and LibXML can't on its own. my $normalize_ns = sub { my $str = shift; $str =~ s/(<\w+)\s+xmlns="\Q$ns\E"/$1/og; $str =~ s/<feed\b/<feed xmlns="$ns"/; $str =~ s/<entry>/<entry xmlns="$ns">/ if $opts->{'single_entr +y'}; return $str; };` [download] I've resorted to what derby suggests for namespaced XHTML too and it works fine (this snippet is old and un-re-tested). `my $root = $doc->documentElement; my $xpc = XML::LibXML::XPathContext->new($html); $xpc->registerNs('x', 'http://www.w3.org/1999/xhtml'); my $htmls = $xpc->find('/x:html', $doc);` [download]	[reply] [d/l] [select]