JohnEl has asked for the wisdom of the Perl Monks concerning the following question:

I'm using XML::Simple to try and check the elements of an xhtml file. I eventually want a script to find elements that need an id, but don't have an id.

here's the perl, edited for brevity

use XML::Simple; use strict; my $config = XMLin('some.xhtml'); processhash($config,'$config'); sub processhash() { my ($href, $path, @junk) = @_; my %configr = %$href; foreach my $k (keys %configr) { print qq|$path :: $k = $configr{$k}\n|; if ($configr{$k} =~ m/HASH/i) //if it is another hash, iterate + through that hash also { processhash($configr{$k},qq|$path -> {'$k'}|); } } }
This works pretty good, it produces an output like this
$config -> {'hx:scriptCollector'} -> {'f:loadBundle'} -> {'bundle_id'} + :: basename = somebasename
for this xml element
<f:loadBundle id="bundle_id" basename="somebasename" var="lbls" />
if the element has children the script tells me it's a hash
$config -> {'hx:scriptCollector'} -> {'h:form'} :: h:panelGrid = HASH( +0x1e54150)
then process that hash, here's the next few lines of output
$config -> {'hx:scriptCollector'} -> {'h:form'} -> {'h:panelGrid'} -> +{'hruleDots1_panel_id'} :: width = 100% $config -> {'hx:scriptCollector'} -> {'h:form'} -> {'h:panelGrid'} -> +{'hruleDots1_panel_id'} :: columnClasses = hruleDots $config -> {'hx:scriptCollector'} -> {'h:form'} -> {'h:panelGrid'} -> +{'hruleDots1_panel_id'} :: cellspacing = 0

All in all, pretty cool. BUT...the whole point is that I'm going try and find elements that are supposed to have ids but do not.

I will add to the script to check and see if it needs an id property, but even when I do it will fail. It's the strangest thing, sometimes the value of the ID is a child of the element.

For example this element
<h:outputText styleClass="required" value="*" id="AdjustCCDetails_R +easonCode_table_required_id">
produces this from the script
{'h:outputText'} :: AdjustCCDetails_ReasonCode_table_required_id = HAS +H(0x1e5400c) {'h:outputText'} -> {'AdjustCCDetails_ReasonCode_table_required_id'} : +: value = * {'h:outputText'} -> {'AdjustCCDetails_ReasonCode_table_required_id'} : +: styleClass = required
the id propery of the element is a child hash of the element that holds the other properties of the element.

here's an example of what I expected XML

<h:outputText id="AdjustCCDetails_Comment_table_id" styleClass="label" + value="#{somevalue}">
script output
{'h:panelGrid'} :: h:outputText = HASH(0x1e16050) {'h:panelGrid'} -> {'h:outputText'} :: value = #{somevalue} {'h:panelGrid'} -> {'h:outputText'} :: id = AdjustCCDetails_Comment_ta +ble_id {'h:panelGrid'} -> {'h:outputText'} :: styleClass = label
That output has an id that I can test for, the previous output does not...even though it has a valid id.

So my questions are, Do I not understand how all this works? If this is the expected behaviour, how can I test for id properties? and Is there a better way to find all the missing ids from an xhtml file?

Replies are listed 'Best First'.
Re: XML::Simple finding id property problem
by grantm (Parson) on May 28, 2008 at 02:08 UTC

    In addition to what runrig said, it would probably be even simpler to solve your problem using XML::LibXML and an appropriate XPath expression. See: Stepping up from XML::Simple to XML::LibXML.

    On a different matter, if your XML document uses namespaces your code should be matching against the namespace URLs rather than the prefix.

      Thanks grantm
      If I get a chance I'll look into LibXML, the perl I have now solves my current problem, but that module looks very interesting.
Re: XML::Simple finding id property problem
by GrandFather (Saint) on May 28, 2008 at 02:02 UTC

    That can't "works pretty good" - it's not Perl (// is not a Perl comment).

    Unless you understand what they actually do and you have a real reason to use them, do not use prototypes (see perlsub). Prototypes generally do not do what you expect if you come from another language such as C++!

    If I run:

    use XML::Simple; use strict; use warnings; my $xml = <<XML; <root> <outputText styleClass="required" value="*" id="AdjustCCDetails_Reason +Code_table_required_id" /> </root> XML my $config = XMLin ($xml); processhash ($config, '$config'); sub processhash { my ($href, $path, @junk) = @_; my %configr = %$href; foreach my $k (keys %configr) { print qq|$path :: $k = $configr{$k}\n|; if ($configr{$k} =~ m/HASH/i) { processhash ($configr{$k}, qq|$path -> {'$k'}|); } } }

    it prints:

    $config :: outputText = HASH(0x2106718) $config -> {'outputText'} :: value = * $config -> {'outputText'} :: id = AdjustCCDetails_ReasonCode_table_req +uired_id $config -> {'outputText'} :: styleClass = required

    which is what you expected (so far as I can tell). What have you done differently? Can you modify my sample to illustrate your problem?


    Perl is environmentally friendly - it saves trees
      "That can't "works pretty good" - it's not Perl (// is not a Perl comment)."
      The // was a slip of the finger in my editor, I was trying to keep my post uncomplicated, and make it obvious what I was doing in that line.
      It wasn't really a prototype, it was just a mistake made from going too fast; the network here was scheduled to go down and I was in a rush.
      I didn't use the code tags because I was in a rush and forgot about them, yes, even though the warning against them is on the page.
      I apologize for taking up your time, and to anyone else who was put out by my post
Re: XML::Simple finding id property problem
by runrig (Abbot) on May 27, 2008 at 23:19 UTC
    See KeyAttr. Especially this part:
    The default value for 'KeyAttr' is ['name', 'key', 'id']
      So the solution is found by reading the documentation.
      Who would have thought?
      thanks runrig