Why does KMEANS return different results when invoked on the same input?

8 views (last 30 days)
When I run the following code multiple times, KMEANS returns different partitions (and hence a different vector s of within-cluster sums of point-to-centroid distances) although the data matrix a is the same:
a = [0 -1 0 2 0]
[b c s] = kmeans(a,2,'distance','cityblock')
Output 1:
b =
2
2
2
1
2
c =
2
0
s =
0
1
Output2:
b =
2
1
2
2
2
c =
-1
0
s =
0
2

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 24 Feb 2011
This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., b is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:
[b c s] = kmeans(a,2,'distance','cityblock','start',[0 1]')
This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the solution with the lowest value for s.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!