GVM compared to k-means clustering

Experience with the GVM algorithm, so far, has indicated that it results in clusterings of comparable quality to the k-means algorithm except that the GVM algorithm only needs to take a single pass through the dataset - a big advantage with datasets that don’t fit in memory - and has a robust upper bound on its execution time.

These plots (which were produced using R) provide some qualitative comparisons of the two algorithms. The fist plot in each set contains the points prior to clustering. The last plot in each set is one obtained using EM.

Old Faithful Dataset

Waiting time between eruptions and the duration of the eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming, USA.

Crossed Gaussians Dataset

Two 2D normal distributions crossing at the origin.

Mouse Dataset

Points uniformly distributed within three circles.