Changing the indices in kmeans clustering

3 views (last 30 days)
I am using kmeans function to cluster a given dataset. I have predefined indices for each data point in the set. I need to check the accuracy level of the clustering by comparing the data point indices. The problem is that kmeans uses its own indexing, thus, creating errors when comparing the defined indices and the kmeans indices. (For example, I have data points with indices "2" and "6", but the kmeans results in indices "1" and "2". This makes the comparison void.)
It is possible to instruct MATLAB to use the predefined indices for the kmeans cluster assignment?

Answers (4)

Image Analyst
Image Analyst on 5 Mar 2020
I made a nice demo for this last month.
For example, here is how I reassigned arbitrary class numbers according to distance from the origin:
And for another example, I reassigned arbitrary class numbers according to distance from the origin. See attached code.

the cyclist
the cyclist on 5 Mar 2020
I am not certain how robust this solution will be, but I think one possibility is to input initial centroid guesses that are near your predefined indices. For example,
rng default
N = 2000;
x = [randn(N/2,1); 5+randn(N/2,1)];
y = [randn(N/4,1); 5+randn(N/4,1); randn(N/4,1); 5+randn(N/4,1)];
figure
scatter(x,y)
% Don't make an initial guess
[idx_noguess,C_noguess] = kmeans([x,y],4)
% Make an initial guess that is close
initialCentroidGuess = [5 5; 0 5; 5 0; 0 0];
[idx_guess,C_guess] = kmeans([x,y],[],'Start',initialCentroidGuess)
resulting in
C_noguess =
4.9852 4.9874
5.0432 0.0325
-0.0624 0.0335
-0.0689 5.0692
C_guess =
4.9852 4.9874
-0.0689 5.0692
5.0432 0.0325
-0.0624 0.0335
Notice how when I make an initial guess, the final values are "aligned" with how I primed them.

Walter Roberson
Walter Roberson on 5 Mar 2020
It is possible to instruct MATLAB to use the predefined indices for the kmeans cluster assignment?
No.
Also, the solution from "the cyclist" only reduces the problem and does not fix it. It is possible (though not common) for the identity of clusters to swap.
About the best you can do is ask to return the cluster centers, and then find the distance from each to your initial centers, min(), build an index map, and process the returned indices through the index map.
However, it would not be uncommon for two different clusters to both be closest to the same point, so you have to be careful with the matching process.

the cyclist
the cyclist on 5 Mar 2020
Edited: the cyclist on 5 Mar 2020
There is a different solution, but it can only be applied after-the-fact, when you have seen how kmeans has assigned the clusters. You could create a map from the indices output by MATLAB to the predefined ones:
map = [1 3 4 2];
which is interpreted as
  • kmeans index 1 should map to your predefined index 1
  • kmeans index 2 should map to your predefined index 3
  • kmeans index 3 should map to your predefined index 4
  • kmeans index 4 should map to your predefined index 2
Then, if you simply do
redefined_index = map(original_index_from_kmeans);
you'll get the indices you need.

Products


Release

R2012b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!