NioSax – Sax style xml parser for Java NIO

NioSax (pronounced ‘Neo-Sax’) provides a Java NIO friendly XML push parser similar in operation to SAX. Unlike SAX, with NioSax it is possible for the xml source to contain partial content (i.e. only part of the XML stream has been received over the network). When this occurs, instead of failing with an error, NioSax simply stops. As soon as your application receives more data you simply call the same instance of the parser again and it will resume parsing where it left off.

NioSax (pronounced ‘Neo-Sax’) provides a Java NIO friendly XML push parser similar in operation to SAX. Unlike SAX, with NioSax it is possible for the xml source to contain partial content (i.e. only part of the XML stream has been received over the network). When this occurs, instead of failing with an error, NioSax simply stops. As soon as your application receives more data you simply call the same instance of the parser again and it will resume parsing where it left off.

The public API consists of the classes within this package, although the bare minimum required for use are the NioSaxParser, NioSaxParserHandler and NioSaxSource classes.

To use NioSax you simply use NioSaxParserFactory to create a NioSaxParser, implement a SAX ContentHandler and finally create a NioSaxSource which references the content.

Then you can parse one or more ByteBuffer’s by updating the NioSaxSource with each buffer and pass it to the NioSaxParser.parse(NioSaxSource) method.

The only other two things you must to do with the parser is to ensure that you call NioSaxParser.startDocument() prior to any parsing, and call NioSaxParser.endDocument() once you are done with the parser so any resources used can be cleaned up.

Example

First in maven we need to add a dependency to NioSax. For details of the repository click on the ‘reteptools’ menu above. However you’ll need to add the following to your pom:

<dependency>
    <groupId>uk.org.retep</groupId>
    <artifactId>niosax</artifactId>
    <version>10.6</version>
</dependency>

Now we’ll create a parser:

import java.nio.ByteBuffer;
import uk.org.retep.niosax.NioSaxParser;
import uk.org.retep.niosax.NioSaxParserFactory;
import uk.org.retep.niosax.NioSaxParserHandler;
import uk.org.retep.niosax.NioSaxSource;

public class MyParser
{
    private NioSaxParser parser;
    private NioSaxParserHandler handler;
    private NioSaxSource source;

    public void start()
    {
        NioSaxParserFactory factory = NioSaxParserFactory.getInstance();

        parser = factory.newInstance();
        parser.setHandler( handler );
        source = new NioSaxSource();

        parser.startDocument();
    }
}

Next, when you receive data from some nio source and have the data in a ByteBuffer you need to pass it to the parser:

    public void parse( ByteBuffer buffer )
    {
        // flip the buffer so the parser starts at the beginning
        buffer.flip();

        // update the source (presuming the buffer has changed)
        source.setByteBuffer( buffer );

        // Parse the available content then compact
        parser.parse( source );
        source.compact();
    }

Finally we must call endDocument() to release any resources:

    public void close()
    {
        // releases any resources and notifies the handler the docment has completed
        parser.endDocument();
    }

Now all we need to is when we receive some data from an external source like a Socket, we pass the ByteBuffer to the parse method. This then passes it to the NioSax parser which in turn calls the ContentHandler as the parse progresses.

When it gets to the end of the available content, it compacts the buffer so that it can be reused.

Usually the buffer will now be empty, however if there was partial content (like only part of a Unicode character was present) then the parser would stop prior to that character and that character would remain in the buffer. The next packet received via nio would have the rest of that character and the parser would then continue where it left off.

This was originally posted early in 2009 but the post seemed to have vanished so this article is loosely based on the documentation for NioSax.

It’s been a busy couple of months

It’s been a couple of busy months with most of my time being taken up with my day job.

Most of my time has been spent with either tracking down issues with our live environment, or trying to finish off a couple major projects (both related to XMPP) interspersed with the usual major partner getting in the way.

Any how, over the last couple of weeks I’ve been finishing off some new features which cover most of my public projects and this post will hopefully cover some of the details.

Hopefully these will be released this weekend, time allowing.

The new features are:

RetepTools
* the builder api within retepTools has been updated
* the jaxb plugin library has been cleaned up with common generation code split out to enable reuse
* retepTools as a project is almost ready for deployment to maven central
* a new pligun has been added to jaxb which generates builders for jaxb objects

RetepMicroKernel
* spring has been updated to the latest version 3 (it was on 2.5)
* the core module has been broken up into individual & independent modules
* a new groovy module which enables groovy scripts to be run from the command line
* a major bug fix where exceptions thrown during application startup causes the process to hang has been fixed
* web applications can now be deployed as a war with either jetty or tomcat (they are both supported with their own modules)
* you can now embed Apache Derby within the environment

I’m leaving out the retepXMPP changes out of this list as they need their own article. Suffice it to say, I’ve got a lot waiting for release, just need the time.

Finally, this post is also a test of submitting a blog post from a BlackBerry using the WordPress app so the formatting may be off a tad – won’t know how it goes until I see it in a real browser.

retepTools Concurrency Support

retepTools Concurrency Support

The retepTools library provides additional concurrency support to that provided by the java.util.concurrency package of JDK 1.5 or later.

The core component of the concurrency support provided by retepTools is the locking. By utilising a set of annotations, it is possible to mark a method so that the entire body of that method is bound to the scope of that Lock.

For example, we have a bean with a property. The properties value can be read by any number of Threads, but it can be set by only one at a time. In the normal java.util.concurrent.lock way you would write something like this:

import java.util.concurrent.locks.ReadWriteLock;

import java.util.concurrent.locks.ReentrantReadWriteLock;

public class MyBean

{

  private final ReadWriteLock lock = new ReentrantReadWriteLock();

  private int value;

  public int getValue()

  {

    lock.readLock().lock();

    try

    {

      return value;

    }

    finally

    {

      lock.readLock().unlock();

    }

  }

  public void setValue( int newValue )

  {

    lock.writeLock().lock();

    try

    {

      value = newValue;

    }

    finally

    {

      lock.writeLock().unlock();

    }

  }

}

Now imaging doing that on an object with dozens of properties… a lot of boiler plate code which is prone to typing errors. retepTools removes this boiler plate code by using a set of thee annotations: @Lock, @ReadLock and @WriteLock.

All three annotations follow a simple contract:

    * your object must implement a corresponding method for each annotation.

    * the associated method is named as it’s annotation, i.e. @ReadLock expects readLock(), etc.

    * that method must be declared either private or “protected final”.

    * the Lock object returned by those methods must also be final – specifically it must be the same object returned for every invocation of that method on that instance.

For convenience the class uk.org.retep.util.collections.ConcurrencySupport implements this contract for objects using @ReadLock and @WriteLock.

So here’s the above code rewritten to use the annotations:

import java.util.concurrent.locks.Lock;

import java.util.concurrent.locks.ReadWriteLock;

import java.util.concurrent.locks.ReentrantReadWriteLock;

import net.jcip.annotations.ThreadSafe;

import uk.org.retep.annotations.ReadLock;

import uk.org.retep.annotations.WriteLock;

@ThreadSafe

public class MyBean

{

  private final ReadWriteLock lock = new ReentrantReadWriteLock();

  private int value;

  protected final Lock readLock()

  {

    return lock.readLock();

  }

  protected final Lock writeLock()

  {

    return lock.writeLock();

  }

  @ReadLock

  public int getValue()

  {

      return value;

  }

  @WriteLock

  public void setValue( int newValue )

  {

      value = newValue;

  }

}

Now thats alot cleaner, less error prone and yet easier to see what the business logic is rather than having it obscured by the locking mechanism.

retepTools also provides several concrete base classes to make this even easier. The main one is uk.org.retep.util.concurrent.ReadWriteConcurrencySupport which provides concrete implementations of readLock() and writeLock(). So the above example can be made even simpler by extending ReadWriteConcurrencySupport:

import java.util.concurrent.locks.ReadWriteLock;

import java.util.concurrent.locks.ReentrantReadWriteLock;

import net.jcip.annotations.ThreadSafe;

import uk.org.retep.annotations.ReadLock;

import uk.org.retep.annotations.WriteLock;

import uk.org.retep.util.concurrent.ReadWriteConcurrencySupport;

@ThreadSafe

public class MyBean extends ConcurrencySupport

{

  private int value;

  @ReadLock

  public int getValue()

  {

      return value;

  }

  @WriteLock

  public void setValue( int newValue )

  {

      value = newValue;

  }

}

Now thats alot cleaner, less error prone and yet easier to see what the business logic is rather than having it obscured by the locking mechanism.

Footnotes

  1. I originally started to write this article back in February however since retepTools 9.2 a new annotation processor was added which performs additional sanity checks.
  2. Project Coin is currently taking proposals for small changes to the Java Language. One of those proposals was Automatic Resource Management and during the debate about that proposal Lock’s were brought up. Now locks are out of scope for that proposal and there is an additional one which now looks like it’s not going to be accepted. I’ll be writing another article about that one later.