Sets and Nodes with PostgreSQL

At times you need to be able gather information into Sets where an entry can contain one or more entries or sets, or Nodes where an entry has child nodes each of whom can contain their own children. For me it’s usually a geographical tree (Continent, Country, County, Town) or permission roles (Administrator, Moderator, User). Although this is documented elsewhere on the net, here’s how I do this in PostgreSQL utilising pl/pgsql functions.

The tree of nodes consists of one entry in the table per node, with the root node at the top of the tree. Selects on the table are fast, but updates are expensive. As the data contained in the tree rarely changes, this is perfectly fine for most purposes.

For the first example, I’m going to use a geographical tree, so we will have at the end of the exercise the following tree:

First we need a table to store the data. The following SQL Schema defines the node table and inserts the root. Apart from this instance, you should only use the pl/pgsql functions to modify this table:

CREATE SEQUENCE nodeseq;

CREATE TABLE node (

nodeid BIGINT NOT NULL,

l BIGINT NOT NULL,

r BIGINT NOT NULL,

d INTEGER NOT NULL, — the depth of this node, 0 = root

n name NOT NULL,

PRIMARY KEY (nodeid)

);

CREATE INDEX node_l ON node(l);

CREATE INDEX node_r ON node(r);

CREATE INDEX node_lr ON node(l,r);

CREATE INDEX node_lrd ON node(l,r,d);

CREATE INDEX node_ln ON node(l,n);

CREATE INDEX node_rn ON node(r,n);

CREATE INDEX node_lrn ON node(l,r,n);

CREATE INDEX node_lrdn ON node(l,r,d,n);

— The root node (0) preinitialised for the structure to work.

INSERT INTO node (nodeid,l,r,d,n) VALUES (0,1,2,0,’root’);

Here we have the table initialised with the root node. At first it has the nodeid of 0, l=1, r=2 and d=0. Over time the l and r values will change as nodes are added to the tree.

Now for the first set of pl/pgsql functions. This set of functions handle retrieval of nodes from the tree based on certain criteria. I’ll only show the function definitions for now as I’ll show how to use these later in this article.

Retrieving a node by its nodeid

CREATE OR REPLACE FUNCTION node_get( parentid BIGINT )

RETURNS node

AS $$

DECLARE

n node;

BEGIN

SELECT *

INTO n

FROM node

WHERE nodeid=parentid;

IF NOT FOUND THEN

RAISE EXCEPTION ‘Node % not found’, parentid;

END IF;

RETURN n;

END

$$ LANGUAGE ‘plpgsql’;

Is the node a leaf, i.e. it contains no children

CREATE OR REPLACE FUNCTION node_isleaf( childnodeid BIGINT )

RETURNS boolean

AS $$

DECLARE

child node;

BEGIN

child := node_get( childnodeid );

RETURN (child.r-child.l)==1;

END

$$ LANGUAGE ‘plpgsql’;

Does one node contain the other

CREATE OR REPLACE FUNCTION node_contains( parentnodeid BIGINT, childnodeid BIGINT )

RETURNS boolean

AS $$

BEGIN

RETURN node_contains( node_get( parentnodeid ), node_get( childnodeid ) );

END

$$ LANGUAGE ‘plpgsql’;

CREATE OR REPLACE FUNCTION node_contains( parentnode node, childnode node )

RETURNS boolean

AS $$

BEGIN

RETURN parentnode.l<=childnode.l AND parentnode.r>=childnode.r;

END

$$ LANGUAGE ‘plpgsql’;

Retrieve the parent of a node

CREATE OR REPLACE FUNCTION node_parent( rootnodeid BIGINT, childnodeid BIGINT )

RETURNS node

AS $$

DECLARE

rootnode node;

startnode node;

parentnode node;

BEGIN

IF childnodeid = rootnodeid THEN

RAISE EXCEPTION ‘Node % has no parent’, childnodeid;

END IF;

— get the root

rootnode := node_get( rootnodeid );

— get the start

startnode := node_get( childnodeid );

— the root must contain the start

IF NOT node_contains( rootnode, startnode ) THEN

RAISE EXCEPTION ‘Node % is not contained by the root node %’, childnodeid, rootnodeid;

END IF;

— The root is the parent

IF rootnode.d = (startnode.d-1) THEN

RETURN rootnode;

END IF;

SELECT *

INTO parentnode

FROM node

WHERE l<startnode.l AND r>startnode.r AND d=startnode.d-1;

IF NOT FOUND THEN

RAISE EXCEPTION ‘Node % has no parent’, childnodeid;

END IF;

— the root must contain the parent

IF NOT node_contains( rootnode, parentnode ) THEN

RAISE EXCEPTION ‘Node % has no parent’, childnodeid;

END IF;

RETURN parentnode;

END

$$ LANGUAGE ‘plpgsql’;

Add a child node

This is probably the most important function as it adds a new node as a child to another and as such is the primary means of adding nodes to the tree.

CREATE OR REPLACE FUNCTION node_addchild( parentid BIGINT, name NAME )

RETURNS node

AS $$

DECLARE

parent node;

child node;

tmp TEXT[];

BEGIN

— Validate name ensuring it contains no index

SELECT regexp_matches( name, E'([^\\]]+)(\\[([^\\[]+)\\]){0,1}’ )

INTO tmp;

IF tmp[2] IS NOT NULL THEN

RAISE EXCEPTION ‘Name % is invalid, it may not contain an index’, name;

END IF;

parent := node_get( parentid );

/* We allow same named children, but this would disallow it

SELECT *

INTO child

FROM node

WHERE l > parent.l AND r < parent.r AND n = name;

IF FOUND THEN

RAISE EXCEPTION ‘The child % already exists under %’, name, parentid;

END IF;

*/

UPDATE node

SET r=r+2

WHERE r >= parent.r;

UPDATE node

SET l=l+2

WHERE l >= parent.r;

INSERT INTO node (nodeid,l,r,d,n)

VALUES (nextval(‘nodeseq’), parent.r, parent.r+1, parent.d+1, name);

child := node_get( currval(‘nodeseq’) );

RETURN child;

END;

$$ LANGUAGE ‘plpgsql’;

Now using our example tree, the first thing we need to do is to add Europe to the tree as a child of the root. Now the root always has nodeid 0 so we call node_addchild( 0, ‘Europe’ ) to add a new node representing Europe:

test=# select * from node;

nodeid | l | r | d | n

——–+—+—+—+——

0 | 1 | 2 | 0 | root

(1 row)

test=# select * from node_addchild(0,’Europe’);

nodeid | l | r | d | n

——–+—+—+—+——–

1 | 2 | 3 | 1 | Europe

(1 row)

test=# select * from node;

nodeid | l | r | d | n

——–+—+—+—+——–

0 | 1 | 4 | 0 | root

1 | 2 | 3 | 1 | Europe

(2 rows)

Here you’ll notice that after the call to node_addchild() it returned the new node (nodeid 1) and updated the root node so it’s l and r fields bound those of it’s children.

We’ll now add the other nodes except the towns:

test=# select * from node_addchild(1,’United Kingdom’);

nodeid | l | r | d | n

——–+—+—+—+—————-

2 | 3 | 4 | 2 | United Kingdom

(1 row)

test=# select * from node_addchild(2,’Kent’);

nodeid | l | r | d | n

——–+—+—+—+——

3 | 4 | 5 | 3 | Kent

(1 row)

test=# select * from node_addchild(2,’Sussex’);

nodeid | l | r | d | n

——–+—+—+—+——–

4 | 6 | 7 | 3 | Sussex

(1 row)

Adding a node by it’s path

Apart from node_addchild we can also add a node by it’s path. For Canterbury it’s path from Europe is “United Kingdom/Kent/Canterbury” so instead of having to find Kent, we can use the node_addchildbypath function:

test=# select * from node_addchildbypath( 0, 1, ‘United Kingdom/Kent/Canterbury’ );

nodeid | l | r | d | n

——–+—+—+—+————

5 | 5 | 6 | 4 | Canterbury

(1 row)

Here’s the node_addchildbypath function:

CREATE OR REPLACE FUNCTION node_addchildbypath( rootnodeid BIGINT, parentid BIGINT, path TEXT )

RETURNS node

AS $$

DECLARE

rootnode node;

parent node;

child node;

tmp TEXT[];

lastname TEXT;

path2 TEXT;

name2 TEXT;

BEGIN

rootnode := node_get( rootnodeid );

parent := node_get( parentid );

IF NOT node_contains( rootnode, parent ) THEN

RAISE EXCEPTION ‘Parent % is not contained by root %’, parentid, rootnodeid;

END IF;

— Extract the last path

SELECT regexp_split_to_array( path, E’\\/’ )

INTO tmp;

IF array_upper( tmp, 1 ) = 1 THEN

child := node_addchild( parentid, path );

ELSE

— remove the tail element

name2 := tmp[ array_upper(tmp, 1 ) ];

path2 := substr( path, 0, length(path) – length( name2 ) );

parent := node_findbypath( rootnodeid, parentid, path2 );

child := node_addchild( parent.nodeid, name2 );

END IF;

RETURN child;

END;

$$ LANGUAGE ‘plpgsql’;

Now we’ll add the last set of towns to the tree:

test=# select * from node_addchildbypath( 0, 1, ‘United Kingdom/Kent/Maidstone’ );

nodeid | l | r | d | n

——–+—+—-+—+———–

6 | 7 | 8 | 4 | Maidstone

(1 row)

test=# select * from node_addchildbypath( 0, 1, ‘United Kingdom/Kent/Tonbridge’ );

nodeid | l | r | d | n

——–+—-+—-+—+———–

7 | 9 | 10 | 4 | Tonbridge

(1 row)

test=# select * from node_addchildbypath( 0, 1, ‘United Kingdom/Sussex/Arundel’ );

nodeid | l | r | d | n

——–+—-+—-+—+———

8 | 13 | 14 | 4 | Arundel

(1 row)

test=# select * from node;

nodeid | l | r | d | n

——–+—-+—-+—+—————-

6 | 7 | 8 | 4 | Maidstone

3 | 4 | 11 | 3 | Kent

7 | 9 | 10 | 4 | Tonbridge

0 | 1 | 18 | 0 | root

1 | 2 | 17 | 1 | Europe

2 | 3 | 16 | 2 | United Kingdom

4 | 12 | 15 | 3 | Sussex

8 | 13 | 14 | 4 | Arundel

5 | 5 | 6 | 4 | Canterbury

(9 rows)

Retrieving the children of a node

When handling sets, you can simply use the following to return all entries contained by a node – here we will ask for all nodes within the United Kingdom:

test=# select * from node_get(2);

nodeid | l | r | d | n

——–+—+—-+—+—————-

2 | 3 | 16 | 2 | United Kingdom

(1 row)

test=# select * from node where l > 3 and r < 16;

nodeid | l | r | d | n

——–+—-+—-+—+————

6 | 7 | 8 | 4 | Maidstone

3 | 4 | 11 | 3 | Kent

7 | 9 | 10 | 4 | Tonbridge

4 | 12 | 15 | 3 | Sussex

8 | 13 | 14 | 4 | Arundel

5 | 5 | 6 | 4 | Canterbury

(6 rows)

Now this is fine for sets, but when dealing with a tree we usually need to know those nodes immediately beneath the node, in this case the counties. This is where the node_children function is handy:

test=# select * from node_children(2);

nodeid | l | r | d | n

——–+—-+—-+—+——–

3 | 4 | 11 | 3 | Kent

4 | 12 | 15 | 3 | Sussex

(2 rows)

Here’s the definition for node_children – here you’ll notice it’s similar to retrieving the contained entries but it now uses the d column:

CREATE OR REPLACE FUNCTION node_children( parentid BIGINT )

RETURNS SETOF node

AS $$

DECLARE

parent node;

r node%rowtype;

BEGIN

parent := node_get( parentid );

FOR r

IN SELECT *

FROM node

WHERE l BETWEEN parent.l AND parent.r

AND d = parent.d+1

ORDER BY n, nodeid

LOOP

RETURN NEXT r;

END LOOP;

RETURN;

END;

$$ LANGUAGE ‘plpgsql’;

Deleting nodes

Deleting a node is not as simple as simply deleting the entry from the table. What we have to do is delete the entry including all of it’s children and then close the gap in the l & r columns where the deleted entries used to be. To handle this we have the node_deleteTree function. For this example we’ll add a new County, town then delete them from the tree:

test=# select * from node_addchildbypath( 0, 1, ‘United Kingdom/Cornwall’ );NOTICE: name2 = CornwallNOTICE: path2 = United Kingdom

nodeid | l | r | d | n

——–+—-+—-+—+———-

9 | 16 | 17 | 3 | Cornwall

(1 row)

test=# select * from node_addchildbypath( 0, 1, ‘United Kingdom/Cornwall/St Austell’ );

nodeid | l | r | d | n

——–+—-+—-+—+————

10 | 17 | 18 | 4 | St Austell

(1 row)

test=# select * from node;

nodeid | l | r | d | n

——–+—-+—-+—+—————-

6 | 7 | 8 | 4 | Maidstone

3 | 4 | 11 | 3 | Kent

7 | 9 | 10 | 4 | Tonbridge

4 | 12 | 15 | 3 | Sussex

8 | 13 | 14 | 4 | Arundel

0 | 1 | 22 | 0 | root

1 | 2 | 21 | 1 | Europe

2 | 3 | 20 | 2 | United Kingdom

9 | 16 | 19 | 3 | Cornwall

10 | 17 | 18 | 4 | St Austell

5 | 5 | 6 | 4 | Canterbury

(11 rows)

test=# select node_deleteTree( 9 );

node_deletetree

—————–

2

(1 row)

test=# select * from node;

nodeid | l | r | d | n

——–+—-+—-+—+—————-

6 | 7 | 8 | 4 | Maidstone

3 | 4 | 11 | 3 | Kent

7 | 9 | 10 | 4 | Tonbridge

4 | 12 | 15 | 3 | Sussex

8 | 13 | 14 | 4 | Arundel

0 | 1 | 18 | 0 | root

1 | 2 | 17 | 1 | Europe

2 | 3 | 16 | 2 | United Kingdom

5 | 5 | 6 | 4 | Canterbury

(9 rows)

Here’s the definition for node_deleteTree:

CREATE OR REPLACE FUNCTION node_deleteTree( oldnodeid BIGINT )

RETURNS INTEGER

AS $$

DECLARE

oldnode node;

c INTEGER := 0;

d INTEGER;

BEGIN

— Cannot delete the root

IF oldnodeid = 0 THEN

RAISE EXCEPTION ‘Cannot delete the root’;

END IF;

— Select the node we are deleting

oldnode := node_get( oldnodeid );

— delete the node

DELETE FROM node

WHERE l BETWEEN oldnode.l AND oldnode.r;

GET DIAGNOSTICS c = ROW_COUNT;

— now close the gap between l and r, d is the size of the gap to remove

d := oldnode.r – oldnode.l + 1;

UPDATE node

SET l = CASE

WHEN l > oldnode.l

THEN l – d

ELSE l

END,

r = CASE

WHEN r > oldnode.l

THEN r – d

ELSE r

END

WHERE l > oldnode.l

OR r > oldnode.l;

RETURN c;

END;

$$ LANGUAGE ‘plpgsql’;

The full schema used by this article is available at http://retep.org/blog/psql/nodes.txt

Finding the process id of the process owning a port

Occasionally there’s time where you need to find out what process owns a port currently open.

On the Mac this can be done easily by using the following line – here we are looking for port 8080:

ps u –no-heading -p `netstat -nlept | awk ‘/[:]8080 / {split ($9,t,”/”); print t[1]}’` 2>/dev/null

For windows you don’t have a decent shell (and cygwin would probably not work here), so you can use the following batch script to do the same:

@echo off

for /F “usebackq tokens=5″ %%f in (`netstat -b -n ^| find “:%1″`) do call :process %%f
goto :eof

:process
tasklist /FI “PID eq %1″ /NH

If the above code was called findport.bat then running findport 8080 would then find the process owning port 8080.

Finding the process id of the process owning a port

Occasionally there’s time where you need to find out what process owns a port currently open.

On the Mac this can be done easily by using the following line – here we are looking for port 8080:

ps u –no-heading -p  `netstat -nlept | awk ‘/[:]8080 / {split ($9,t,”/”); print t[1]}’` 2>/dev/null

For windows you don’t have a decent shell (and cygwin would probably not work here), so you can use the following batch script to do the same:

@echo off

for /F “usebackq tokens=5″ %%f in (`netstat -b -n ^| find “:%1″`) do call :process %%f

goto :eof

:process

tasklist /FI “PID eq %1″ /NH

If the above code was called findport.bat then running findport 8080 would then find the process owning port 8080.

Using the retepMicroKernel – Part 1

In the first of four posts introducing the retepMicroKernel project, I will describe what the project is, it’s goals and how it can be used to implement lightweight server side application running either standalone or within a J2EE environment. Part 2 will show how to write an application for the kernel and Part 3 will show how to add a web console to your application while Part 4 will show how to integrate an application into a J2EE environment.

Overview

The retepMicroKernel comprises four distinct components:

  • Binary launcher
  • MicroKernel
  • WebConsole
  • Apache Maven plugin

overview.WV81M45VscxD.e25JeA5bsoLE.jpg

Binary launcher

The binary launcher is a platform specific binary who’s job is to configure the application, start the Java virtual machine and finally start the application by bootstrapping the microKernel.

Currently the following platforms are supported:

  • Linux (both 32 and 64 bit ). This is compiled under Ubuntu but is tested on RHEL4 so should work on most x86 and amd64 distributions
  • Mac OSx (Leopard 10.5.x and Java 6 required)
  • Windows XP (not tested on Vista, currently console only)

Other platforms will be supported as an when I get a working virtual image of those platforms.

MicroKernel

This is the core kernel comprising the microKernel, annotations and core services like the in-memory JNDI server.

WebConsole

The WebConsole is an optional component that, when installed, provides a simple but comprehensive console for your application by utilising the built in web server provided by Java 6. With it you can monitor various statistics about your application using any web browser, and add additional actions to your application for management purposes.

Apache Maven plugin

The Apache Maven plugin enables an application to be built, ensuring that all artifacts are present and installed in the correct locations required by the kernel.

Directory Layout

Each microKernel application is organised so that the final application is self contained. Here’s the directory structure – here the application is called myapp:

  • bin – The binary launchers
  • etc – The application’s configuration
  • lib – The microKernel’s jar files
  • lib/myapp – The jar files for the myapp application

The contents of these directories will be covered in Part 2, but the layout is such that you can have multiple applications deployed in the same directory structure – simply have a copy of the binaries with the correct name, configuration under etc and a directory under lib with the application specific jars. In addition, the maven plugin handles all of this for you as it will generate the entire directory structure, ensure the correct artifacts are installed and generate a zip, tar.gz or tar.bz2 artifact containing the final application.

That’s all for the introduction, next in Part 2 I’ll go into depth on how to write a simple application with beans defined either in the standard spring xml fashon, or by utilising the supplied annotations.

Well yet again I’ve succumbed

It’s happened again – I’ve succumbed to another popular technology, this time blogging. It had to happen at some point, but after some resistence, here it is.

Over time, I’ll be adding entries on how to do various things with Java programming, specifically in the areas of J2EE, Astronomy and DataBases, although I do dabble in some other areas like A.I. Chatbots e.t.c.