Back in late 2008, one of the original design goals for the rewrite of my retepXMPP project was to use JAXB for handling the marshalling of XMPP Stanzas into POJO’s. The main ideas behind this was to standardise against the XMPP schemas available online and to make the addition of further protocols easier – mainly not to break existing code.
Now this was fine until I started testing against a couple of client libraries, certain messages were being ignored. It turned out that the XML generated by JAXB does not follow the rules defined in RFC 3920bis. The problem here was that JAXB was placing all of the namespaces together in the root.
For example here’s one of the examples from RFC3921bis as generated by JAXB:
<iq from='juliet@example.com/balcony' id='rg1' type='get' xmlns='jabber:client' xmlns:ns1='jabber:iq:roster'> <ns1:query/> </iq>
Here’s what it should look like:
<iq from='juliet@example.com/balcony' id='rg1' type='get'> <query xmlns='jabber:iq:roster'/> </iq>
There’s two differences here, but the main issue is where ‘jabber:iq:roster’ is declared in the root iq element and not against the query element it’s supposed to be – the other is the declaration of the ‘jabber:client’ namespace but that one is not a problem here.
Although both are strictly correct XML wise, it isn’t for XMPP and most parsers expect the stream to follow the rules.
So why is JAXB doing this? Well apparently its by design and, if you use the JAXB-RI like I do, there’s not much we can do to change this.
In JAXB 1 it actually declared the namespaces at the appropriate places so we would get the ‘jabber:iq:roster’ namespace against the query element as expected and the XMPP stanza would then conform correctly.
However in JAXB 2.0 they changed this to the current behaviour for performance reasons. Apparently with large documents looking ahead for what namespaces are present uses a lot of resources, so they declare them first.
Now I can see their point but for one small issue – this presumes that, when a document is being parsed then unmarshalled by JAXB it was originally produced by JAXB – what if it wasn’t generated by JAXB? In that case JAXB would still do the lookup anyhow.
Unfortunately when I took a look at the source for JAXB 2.2 this declaring of namespaces is done pretty close to the start of the marshalling process, so unless they add an option to disable it we have to live with this.
So how to fix this so we can still use JAXB but generate conforming XML?
Well I’ve got a solution but it’s not pretty. The solution simply involves breaking the marshalling/unmarshalling process up into separate units of work, one per namespace. We marshal the first object, then – if it has children, marshal them.
Now this involves additional work during the marshalling/unmarshalling process but it works. The downside is that we have to modify the schemas to do this.
Fortunately the XSF’s schemas are not concrete. As Peter Saint-Andre’s put it a few weeks ago on the muc mailing list, those schemas are ‘descriptive, not normative’. This means they are representative of what the XEP’s define, but they can be changed. In fact for some (like XEP-0045/MUC) the schemas don’t define the extension points used by other XEP’s. This means we can make some simple modifications to them to get this ‘hack’ to work.
So now I have this working pretty well – marshalling to xml runs smoothly. Unmarshalling still has a few problems but I’m being hit by a nice issue with Java’s generics but that’s going to be a later post.