The GVM algorithm has been initially developed as a Java library.
The GVM library jar can be downloaded from the Google Code project for this site.
To build the current development snapshot, you need svn, Java 1.6 (or above) and Maven 2.2 (or above).
svn checkout http://tomgibara.googlecode.com/svn/trunk/cluster
cd cluster/cluster-mojo/
mvn install
cd ..
mvn install
You can also browse the source code online.
Clustering using the GVM Java library is very simple.
First you need to choose the subpackage that matches your coordinate type. The GVM library uses source code generation to provide efficient implementations for each of the numeric Java primitives; the generic root package should be avoided.. The packages are:
Next you create a Clusters object, specifying the dimension and maximum number of clusters.
For example, if up-to 10 clusters of 3 dimensional double coordinates were being sought:
DblClusters<Key> clusters = new DblClusters<Key>(3, 10);
Here Key is a type of object that your application associates with each point/cluster.
There is one key for each cluster (eqv. point) and how keys are assigned to new clusters, or combined
when clusters are merged is controlled by the Keyer. By default, the clusters will use a
Keyer that picks the key from the largest cluster/point, but other implementations are
possible, and the implementation used can be set like so:
clusters.setKeyer(myKeyer);
Once the Clusters object has been created, it's simply a matter of adding points to it.
Each point has a mass (which can be used as a weighting by the application, or set as 1.0 for every point),
a coordinate vector (in the form of an array of coordinates), and (optionally) a key. The are added using
the add method:
clusters.add(mass, pt, key);
At any point during the clustering process (but usually after the last point has been added, and before the results have been returned).
The number of clusters can be reduced, by calling the reduce method; the first argument constrains the total variance (negative if there's no constraint),
the second number constrains the number of clusters (zero if there's no constraint).
clusters.reduce(100.0, 2);
Finally, the computed clustering can be obtained (usually after all the points have been added, though it can be called at any time).
by calling the results method. This method performs almost no computation to return a list of Results objects
that each contain information about an identified cluster. eg.
List<DblResult<Key>> results = clusters.results()
Javadoc library documentation is available from the project's Maven site.
The essential package documentation is that of the com.tomgibara.cluster.gvm subpackages.