GVM: Fast Spatial Clustering
I needed of a spatial clustering algorithm that could cope with extremely large data
sets. I wasn't able to find an existing algorithm that suited my needs so I developed a
new one. It is called GVM for Greedy Variance Minimization.
Characteristics
The GVM algorithm has the following characteristics:
- The algorithm places no restrictions on the number of
items (N), dimensions (D) and clusters (C).
- The algorithm scales linearly with the number items clustered. Specifically
the worst case time complexity of the algorithm is N⋅D⋅C log C;
- Memory usage is constant and independent of the number of items clustered.
Specifically, the space complexity is C2 + D⋅C.
See the navigation for information about the algorithm and its implementation.
Implementations
I'm currently using
JProfiler to investigate possible optimizations.