I've been out of the loop a long time, and I'm having a little trouble wrapping my head around xml parsing & namespaces. I'm trying to use XPathContext as I've read, but I still have to spell out everything in findnodes() and the context doesn't have any effect at all if I comment it out. What am I doing wrong? Is this the correct approach to parsing the document, to extract a single column of words from the table? The code below does work. I’m just not sure it’s the correct way to do it...
use strict; use warnings; use XML::LibXML; use open ':std', ':encoding(UTF-16)'; use constant XML_WORD_COLUMN => 1; my $filename = 'Concordance.xml'; open my $fh, '<', $filename or die "Can't open $filename: $!"; binmode $fh, ':raw'; # drop PerlIO layers on this handle my $doc = XML::LibXML->load_xml(IO => $fh); # ===> This doesn't matter <=== my $xpc = XML::LibXML::XPathContext->new($doc); $xpc->registerNs( o => "urn:schemas-microsoft-com:office:office" + ); $xpc->registerNs( x => "urn:schemas-microsoft-com:office:excel" + ); $xpc->registerNs( ss => "urn:schemas-microsoft-com:office:spreadshee +t" ); $xpc->registerNs( html => "http://www.w3.org/TR/REC-html40" + ); $xpc->registerNs( def => "urn:schemas-microsoft-com:office:spreadshee +t" ); my $table = $xpc->findnodes(q{//ss:Worksheet[@ss:Name='Sheet 1']/ss:Ta +ble/ss:Row}) or die "Can't find table in Worksheet 'Sheet 1': $!"; foreach my $row ($table->get_nodelist) { my $col_index = 1; foreach my $cell ($row->nonBlankChildNodes) { if ($col_index++ == XML_WORD_COLUMN) { my $d = $cell->find('./ss:Data'); print $d->to_literal, "\n"; } } } __END__
<?xml version="1.0" encoding="utf-8"?> <?mso-application progid="Excel.Sheet"?> <Workbook xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="u +rn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsof +t-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40 +" xmlns="urn:schemas-microsoft-com:office:spreadsheet"> <Worksheet ss:Name="Sheet 1"> <Table> <Row> <Cell> <Data ss:Type="String">Word</Data> </Cell> <Cell> <Data ss:Type="String">Count</Data> </Cell> </Row> <Row> <Cell> <Data ss:Type="String">Aaron</Data> </Cell> <Cell> <Data ss:Type="String">330</Data> </Cell> </Row> <Row> <Cell> <Data ss:Type="String">Aaron’s</Data> </Cell> <Cell> <Data ss:Type="String">25</Data> </Cell> </Row> <Row> <Cell> <Data ss:Type="String">Abaddon</Data> </Cell> <Cell> <Data ss:Type="String">7</Data> </Cell> </Row> <!-- Blah Blah Blah --> </Table> <x:WorksheetOptions> <x:FreezePanes /> <x:FrozenNoSplit /> <x:SplitHorizontal>1</x:SplitHorizontal> <x:TopRowBottomPane>1</x:TopRowBottomPane> <x:ActivePane>2</x:ActivePane> </x:WorksheetOptions> </Worksheet> </Workbook>

In reply to XML Namespaces by simsrw73

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.