NioSax (pronounced ‘Neo-Sax’) provides a Java NIO friendly XML push parser similar in operation to SAX. Unlike SAX, with NioSax it is possible for the xml source to contain partial content (i.e. only part of the XML stream has been received over the network). When this occurs, instead of failing with an error, NioSax simply stops. As soon as your application receives more data you simply call the same instance of the parser again and it will resume parsing where it left off.
The public API consists of the classes within this package, although the bare minimum required for use are the NioSaxParser, NioSaxParserHandler and NioSaxSource classes.
To use NioSax you simply use NioSaxParserFactory to create a NioSaxParser, implement a SAX ContentHandler and finally create a NioSaxSource which references the content.
Then you can parse one or more ByteBuffer’s by updating the NioSaxSource with each buffer and pass it to the NioSaxParser.parse(NioSaxSource) method.
The only other two things you must to do with the parser is to ensure that you call NioSaxParser.startDocument() prior to any parsing, and call NioSaxParser.endDocument() once you are done with the parser so any resources used can be cleaned up.
Example
First in maven we need to add a dependency to NioSax. For details of the repository click on the ‘reteptools’ menu above. However you’ll need to add the following to your pom:
<dependency> <groupId>uk.org.retep</groupId> <artifactId>niosax</artifactId> <version>10.6</version> </dependency>
Now we’ll create a parser:
import java.nio.ByteBuffer; import uk.org.retep.niosax.NioSaxParser; import uk.org.retep.niosax.NioSaxParserFactory; import uk.org.retep.niosax.NioSaxParserHandler; import uk.org.retep.niosax.NioSaxSource; public class MyParser { private NioSaxParser parser; private NioSaxParserHandler handler; private NioSaxSource source; public void start() { NioSaxParserFactory factory = NioSaxParserFactory.getInstance(); parser = factory.newInstance(); parser.setHandler( handler ); source = new NioSaxSource(); parser.startDocument(); } }
Next, when you receive data from some nio source and have the data in a ByteBuffer you need to pass it to the parser:
public void parse( ByteBuffer buffer ) { // flip the buffer so the parser starts at the beginning buffer.flip(); // update the source (presuming the buffer has changed) source.setByteBuffer( buffer ); // Parse the available content then compact parser.parse( source ); source.compact(); }
Finally we must call endDocument() to release any resources:
public void close() { // releases any resources and notifies the handler the docment has completed parser.endDocument(); }
Now all we need to is when we receive some data from an external source like a Socket, we pass the ByteBuffer to the parse method. This then passes it to the NioSax parser which in turn calls the ContentHandler as the parse progresses.
When it gets to the end of the available content, it compacts the buffer so that it can be reused.
Usually the buffer will now be empty, however if there was partial content (like only part of a Unicode character was present) then the parser would stop prior to that character and that character would remain in the buffer. The next packet received via nio would have the rest of that character and the parser would then continue where it left off.
This was originally posted early in 2009 but the post seemed to have vanished so this article is loosely based on the documentation for NioSax.
Instead of SAX, have you investigated vtd-xml?
No, because I must support NIO based XML streams, so the problem there is that I need something similar to STaX but in a push rather than a pull configuration. Also because character sets other than ASCII must be supported I had to handle the possibility of the stream stopping part way through due to the data being split up as it’s sent over the network. No existing API out there supports that (without parsing from the beginning again), hence ending up writing one from scratch.
Also, the output had to be DOM as that is then passed on to other frameworks (specifically JAXB in my case) so a non-standard framework would be out.
> No existing API.
Not right 🙂
Tatoo is a general parser generator that is able to produce NIO based push parsers.
see http://portal.acm.org/citation.cfm?id=1529707
Rémi
Didn’t know about that one 🙂 Is there a direct link to it?
i have been searching for such sax parser quite few times. It Looks impressing.
is nioxml completely xml spec compliant. are there any limitations?
and what is the license?
I will look into more details of into your project.
BTW, I also have a project which contains some Core libraries. If you are interested you can have a look at
http://code.google.com/p/jlibs/
It’s not got any validation in there – the core just parses what it receives into a DOM tree which can then be consumed either in it’s entirety or in fragments (it was originally written to support XML streams, specifically XMPP/Jabber).
As for licence, it’s BSD.
Thanks for sharing your code. It works like charm 🙂 due my leak of knowledge about the Charset internals i would suggest use JDK NIO Charset API
// Snip of NioSaxSource
public final boolean isValid(final char c) {
return c != NOT_ENOUGH_DATA && c != INVALID_CHAR;
}
public final boolean hasCharacter() {
return buffer != null && buffer.hasRemaining();
}
public final char decode() {
if (!hasCharacter()) {
return NOT_ENOUGH_DATA;
}
b.rewind();
//Where decoder = charset.newDecoder() and b = CharBuffer.allocate(1)
CoderResult result = decoder.decode(buffer, b, true);
if (result.isError()) {
return INVALID_CHAR;
}
return b.get(0);
}
Thanks again for sharing
The problem with the NIO CharSet API is that it decodes an entire ByteBuffer in one go and it assumes that everything is in that buffer. If the buffer is only partial (for example due to a fragmented network packet) then it will fail.
Heres the problem, say you receive from the network a couple of UTF-16 A & B characters. This would be 4 bytes in total:
[A0][A1][B0][B1]
Due to the network fragmenting the packet the second one is only partially received (the second byte is still in transit). In this case our ByteBuffer contains:
[A0][A1][B0]
Then the nio API would fail as the second character is incomplete. What I do is to decode up the the beginning of the partial but leave it in the ByteBuffer – hence the NOT_ENOUGH_DATA state. Then when you return that buffer to NIO, it then appends the next block from the network which happens to have [B1], the char is then complete and it can be decoded.
According to the javadoc of method decode(ByteBuffer, CharBuffer, boolean)
Hi, i’ve started a little project but i’m in trouble with nio – xml decoding.
I can’t fint your source or a package to use. Can you help me?
Hi, i’m in stuck with a little project with nio-xml decoding. I can’t find any source or package, can you help me?