This argument replaces the corresponding XML::Parser argument. It consists
of a hash { expression = \&handler}> where expression is a
generic_attribute_condition, string_condition,
an attribute_condition,full_path, a partial_path, a gi,
_default_ or <_all_>.
The idea is to support a usefull but efficient (thus limited) subset of
XPATH. A fuller expression set will be supported in the future, as users
ask for more and as I manage to implement it efficiently. This will never
encompass all of XPATH due to the streaming nature of parsing (no lookahead
after the element end tag).
A generic_attribute_condition is a condition on an attribute, in the form
*[@att='val'] or *[@att], simple quotes can be used instead of double
quotes and the leading '*' is actually optional. No matter what the gi of the
element is, the handler will be triggered either if the attribute has the
specified value or if it just exists.
A string_condition is a condition on the content of an element, in the form
gi[string()='foo'], simple quotes can be used instead of double quotes, at
the moment you cannot escape the quotes (this will be added as soon as I
dig out my copy of Mastering Regular Expressions from its storage box).
The text returned is, as per what I (and Matt Sergeant!) understood from
the XPATH spec the concatenation of all the text in the element, excluding
all markup. Thus to call a handler on the element<p>text <b>bold</b></p>
the appropriate condition is p[string()='text bold']. Note that this is not
exactly conformant to the XPATH spec, it just tries to mimic it while being
still quite concise.
An extension of that notation is gi[string(child_gi)='foo'] where the
handler will be called if a child of a gi element has a text value of
foo. At the moment only direct children of the gi element are checked.
If you need to test on descendants of the element let me know. The fix is
trivial but would slow down the checks, so I'd like to keep it the way it is.
A regexp_condition is a condition on the content of an element, in the form
gi[string()=~ /foo/']. This is the same as a string condition except that
the text of the element is matched to the regexp. The i, m, s and o
modifiers can be used on the regexp.
The gi[string(child_gi)=~ /foo/'] extension is also supported.
An attribute_condition is a simple condition of an attribute of the
current element in the form gi[@att='val'] (simple quotes can be used
instead of double quotes, you can escape quotes either).
If several attribute_condition are true the same element all the handlers
can be called in turn (in the order in which they were first defined).
If the ='val' part is ommited ( the condition is then gi[@att]) then
the handler is triggered if the attribute actually exists for the element,
no matter what it's value is.
A full_path looks like '/doc/section/chapter/title', it starts with
a / then gives all the gi's to the element. The handler will be called if
the path to the current element (in the input document) is exactly as
defined by the full_path.
A partial_path is like a full_path except it does not start with a /:
'chapter/title' for example. The handler will be called if the path to
the element (in the input document) ends as defined in the partial_path.
WARNING: (hopefully temporary) at the moment string_condition,
regexp_condition and attribute_condition are only supported on a
simple gi, not on a path.
A gi (generic identifier) is just a tag name.
#CDATA can be used to call a handler for a CDATA section respectively.
A special gi _all_ is used to call a function for each element.
The special gi _default_ is used to call a handler for each element
that does NOT have a specific handler.
The order of precedence to trigger a handler is:
generic_attribute_condition, string_condition, regexp_condition,
attribute_condition, full_path, longer partial_path, shorter
partial_path, gi, _default_ .