As a result of investigations into various different fields of data processing, I have developed need of a spatial clustering algorithm that can cope with extremely large data sets. I wasn't able to find an existing algorithm that suited my needs so I developed a new one. It is called gvm for Greedy Variance Minimization.
This algorithm is new to me; if I've rediscovered it please let me know, email me at tomgibara at hotmail dot com.
I am still developing the algorithm in an attempt to make it more efficient. I don't know how its performance compares (in terms of speed, quality or memory) to established algorithms, but in my case the characteristics I list below were more important than raw speed.
The gvm algorithm has the following characteristics:
The applet below doubles as both a demonstration and first implementation of the gvm algorithm. It requires a browser with support for Java 1.5 (or above) to run.
Use the slider to change the number of clusters. The number of cities being clustered is necessarily constrained by a desire to keep the applet download small. The largest city in each of the largest 20 clusters is indicated with a separate marker.
If you can't view the applet for any reason there are always the screenshots, or you can download the demonstration as a Java application.
At this time, the only download available for this algorithm is the demonstration applet in the form of a Java application you can run locally:
gvm-city.jar
If you already have Java 1.5 (or above) there is nothing to install. Simply download the above jar file and then double click on it, or execute the following command line:
java -jar gvm-city.jar