Fun with XML schema - Kore Nordmann

Fun with XML schema

For university I currently have to analyse XML schema quite in depth and there are some interesting / funny things inside the specification I wasn't aware of before. First: The default elementForm value: unqualified. Let me try to explain what this is and why it is kind of strange...

When you write a XML schema, you will most likely do that for some kind of XML documents you need or expose in one of your applications. Since you are specifying a new language using XML as a syntax you will assign it some namespace (the target namespace), so that the instances can be validated against your specification (schema). The basic XML Schema would look something like:

<?xml version="1.0"?> <schema xmlns="" targetNamespace=""> <element name="root"/> </schema>

Where the targetNamespace references the namespace the XML documents / instances are using. Let's consider a bit more complex, but still trivial example, by allowing a set of child elements inside the XML root element <root>:

<?xml version="1.0"?> <schema xmlns="" targetNamespace=""> <element name="root"> <complexType> <sequence> <element name="child" type="string" maxOccurs="unbounded"/> </sequence> </complexType> </element> </schema>

We now allow any amount of <child> elements (with string content) inside the root element. Any sane person now would expect, that the schema validates the following instance:

<?xml version="1.0"?> <root xmlns=""> <child>First child</child> <child>Second child</child> </root>

But it does not:

Schemas validity error : Element '{}child': This element is not expected. Expected is ( child ).

Unqualified, local and global elements

The default value for the elementFormDefault attribute of the schema is "unqualified". This means, that "local" elements inside the schema are not allowed to be used inside the namespace the schema defines.

Global elements inside a schema are those, which are defined at the schema root, as direct descendents of the <schema> element. In the example above it is the <root> element. All other elements are local, which means, they are defined as descendents of other elements, like the <child> element in the example above.

This means, that the <child> elements in my instance are by default not allowed inside the target namespace of the given schema. But the following instance document, where we explicitly exclude the <child> elements from the namespace would actually validate:

<?xml version="1.0"?> <my:root xmlns:my=""> <child>First child</child> <child>Second child</child> </my:root>

In this example we force an explicit namespace prefix to be able to exclude some elements from that namespace. The root element of course must be inside the namespace, while the <child> elements need not to be part of the namespace.

The better "default"

Luckily this behaviour chan be changed with a different value for elementFormDefault in the schema root node:

<?xml version="1.0"?> <schema xmlns="" targetNamespace="" elementFormDefault="qualified"> <element name="root"> <complexType> <sequence> <element name="child" type="string" maxOccurs="unbounded"/> </sequence> </complexType> </element> </schema>

This now validates the following instance as expected:

<?xml version="1.0"?> <root xmlns=""> <child>First child</child> <child>Second child</child> </root>

Funnily this behaviour can not only be changed for the whole schema, but for each element.

The unqualified default setting causes, that the author of an instance has to actually know whether an element has been specified locally or globally in the schema. Knowledge about the structure itself is obviously insufficient. As a side effect pure refactoring of the schema may invalidate instances. To quote the specification on this: [1]

When local elements and attributes are not required to be qualified, an instance author may require more or less knowledge about the details of the schema to create schema valid instance documents.

XML Schema Part 0: Primer Second Edition

Dear lazyweb

We have to deal with this now. But why the hell has this been invented? And why is it even the default value?

If you don't specify any target namespace for your schema it does not matter, whether the form default is qualified or unqualified. If you specify a target namespace I cannot really think of a reason to differentiate between global and local elements inside the _schema_, and therefore inside the instances.

Sorry for the quick rant.