The GVM algorithm has been initially developed as a Java library.
To build the current development snapshot, you need svn, Java 1.6 (or above) and Maven 2.2 (or above).
svn checkout http://tomgibara.googlecode.com/svn/trunk/cluster cd cluster/cluster-mojo/ mvn install cd .. mvn install
You can also browse the source code online.
Clustering using the GVM Java library is very simple.
First you need to choose the subpackage that matches your coordinate type. The GVM library uses source code generation to provide efficient implementations for each of the numeric Java primitives; the generic root package should be avoided.. The packages are:
Next you create a
Clusters object, specifying the dimension and maximum number of clusters.
For example, if up-to 10 clusters of 3 dimensional
double coordinates were being sought:
DblClusters<Key> clusters = new DblClusters<Key>(3, 10);
Key is a type of object that your application associates with each point/cluster.
There is one key for each cluster (eqv. point) and how keys are assigned to new clusters, or combined
when clusters are merged is controlled by the
Keyer. By default, the clusters will use a
Keyer that picks the key from the largest cluster/point, but other implementations are
possible, and the implementation used can be set like so:
Clusters object has been created, it's simply a matter of adding points to it.
Each point has a mass (which can be used as a weighting by the application, or set as 1.0 for every point),
a coordinate vector (in the form of an array of coordinates), and (optionally) a key. The are added using
clusters.add(mass, pt, key);
At any point during the clustering process (but usually after the last point has been added, and before the results have been returned).
The number of clusters can be reduced, by calling the
reduce method; the first argument constrains the total variance (negative if there's no constraint),
the second number constrains the number of clusters (zero if there's no constraint).
Finally, the computed clustering can be obtained (usually after all the points have been added, though it can be called at any time).
by calling the
results method. This method performs almost no computation to return a list of
that each contain information about an identified cluster. eg.
List<DblResult<Key>> results = clusters.results()