in reply to Re^2: Comparative satisfiability of regexps.
in thread Comparative satisfiability of regexps.

Like XML, xsd appears to be a language with arbitrary nesting of tagged regions. In addition, it seems to define symbols which are used in other expressions. Regular expressions are not a suitable tool for such a language. You need something like an LALR or recursive descent parser.

For example, this is a fragment I found:

<xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0" +/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> <!-- Stock Keeping Unit, a code for identifying products --> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>

It does embed regular expressions here and there to define new data types (the definition of "SKU" for example). If those are all you are trying to validate, our suggestions should work. Otherwise, this is a much more difficult problem.

Update: I'm also having trouble seeing the utility of this proposed checker. Suppose I have two arbitrary web applications A and B, and both were kind enough to supply xsd schemas. Can I determine if they can interact meaniningfully? Not really. It's a matter of semantics, not just matching schemas. Say both define a "Price" field. How do I know one is not in dollars and the other in euros? I think that's why the usual practice is "pre-negotiated" schemas.

Replies are listed 'Best First'.
Re^4: Comparative satisfiability of regexps.
by qq (Hermit) on Jan 21, 2005 at 17:34 UTC

    Update: I'm also having trouble seeing the utility of this proposed checker. Suppose I have two arbitrary web applications A and B, and both were kind enough to supply xsd schemas. Can I determine if they can interact meaniningfully? Not really. It's a matter of semantics, not just matching schemas.

    I hope the OP answers this, because its been interesting so far. But for the calendaring application mentioned, why not create a simple schema and require conformance? It would need to be done per client with xslt (the glue language of xml!) or whatever. But without explicit agreement you are just guessing that elements from different schemas with similar names describe the same real-world objects. Sounds dangerous.