Collect the ID number of the datas in the different clusters

6 views (last 30 days)
Hello everyone.
I am running a PCA on some datas, and I've plot those datas regarding the 1st Component and the 2nd Component. With the k-mean algorithm, I displayed the 3 clusters on the same figure, and now my goal would be to extract the ID of every data, depending in which cluster they're in.
To show you the 3 clusters and the datas :
And to show you the ID of the datas I'd like to extract, with the information of "from what cluster do you come from" (I can display their ID with the gname function):
To sum up : ideally, the kind of result I'd like would be a sort of tab, with in the first column the number of the data (what I call ID), and in the second column the cluster it's in.
Wish you good luck and thank you for reading me (and for trying to help).
  1 Comment
Salim El Houat
Salim El Houat on 10 May 2017
ID number could maybe be called index. If that helps you. To help you understand me, I want to make a link between two objects : the matrix called 'test' containing the coordinates of the 35000 observations I'm dealing with on the 2 first components found with the PCA (so its a 35000x2 double) (this is the object containing the index - ID number - of each data) and the clusters I found with the K-mean approach, that are contained in 2 objects :
- the idx2Region, which is a 1362110x1 double and
- the XGrid, which is a 1362110x2 double
Here's the two main codes that I used :
For the density plot :
figure()
plot(test(:,1),test(:,2),'+','MarkerSize',0.5)
xlabel('1st Principal Component')
ylabel('2nd Principal Component')
gname
And the k-mean alogirthm + ploting the 3 clusters :
x1 = min(test(:,1)):0.01:max(test(:,1));
x2 = min(test(:,2)):0.01:max(test(:,2));
[x1G,x2G] = meshgrid(x1,x2);
XGrid = [x1G(:),x2G(:)]; % Defines a fine grid on the plot
idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C);...
% Assigns each node in the grid to the closest centroid
figure;
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...
[0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');
hold on;
plot(test(:,1),test(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';
legend('Region 1','Region 2','Region 3','Data','Location','Best');
hold off;
I can't seem to find a solution, but I'm sure it's not that hard, all the infos are here, I could do it manually with gname but it would take me years. Good luck and thank you all. B/R.

Sign in to comment.

Accepted Answer

Salim El Houat
Salim El Houat on 12 May 2017
Hello everyone,
I've found a solution, if anyone's interested.
So basically, the k-means function gave me two matrix :
- The XGrid, n-by-2 matrix, which is simply dividing the graph in a grid and contains the coordinates of all the points composing the graph.
- The xd2Region, n-by-1 matrix, only saying, for each on the points of the XGrid, if it belongs to the 1st group, 2nd group... depending on how many groups you defined with the k-mean function. I have 3 groups.
So in order to identify which points is in the first group, just take the index of every "group 1" values in the xd2Region, and get all the points referring to those indexes from XGrid. Let's say you got all those points in a new matrix XGrid-1. And let's say that this group is the violet one on the picture. So all I need to do is find the index of all the points in my matrix of value "test" that verify the condition of position:
'for each y, find all the points (a,y) of my matrix test verifying : a < max(XGrid(:,y))'
Hope that helped. Feel free to add tags or change the name of the question if you feel it isn't good and add tags to help other find this topic.
Have a good day.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!