how to reverse normalized data after kmeans clustering in matlab

8 views (last 30 days)
how to reverse normalized data after kmeans clustering in matlab
  6 Comments

Sign in to comment.

Answers (2)

Image Analyst
Image Analyst on 21 Oct 2022
I don't believe that's true. The centroids are the actual centroids, unless you've normalized your data beforehand. Here's proof:
x = [60*rand(100, 1), 90 + 100*rand(100, 1)];
y = [30*rand(100, 1), 60+50*rand(100, 1)];
plot(x, y, '.', 'MarkerSize', 15);
grid on;
[classIndexes, classCentroids] = kmeans([x(:), y(:)], 2)
classIndexes = 200×1
1 1 1 1 1 1 1 1 1 1
classCentroids = 2×2
30.6948 14.8815 139.6902 84.3660
% Plot centroids as magenta crosshairs
hold on
plot(classCentroids(1,1), classCentroids(1,2), 'm+', 'MarkerSize', 150, 'LineWidth', 3)
plot(classCentroids(2,1), classCentroids(2,2), 'm+', 'MarkerSize', 100, 'LineWidth', 3)

Image Analyst
Image Analyst on 28 Oct 2022
Try this:
% Initialization steps.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
markerSize = 30;
data =[...
22704 94
63575 81
25026 72
31510 88
21864 90
32162 95
31585 95
20126 92
39525 97
58691 87
34870 91
28052 89
15122 94
10185 80
30220 95
9066 69
36450 93
8704 67
15140 78
38380 87
15470 85
27553 90
13349 92
11857 71
43514 96];
% Plot original data.
x = data(:, 1);
y = data(:, 2);
subplot(2, 2, 1);
plot(x, y, '.', 'MarkerSize', 15);
grid on;
title('Original, unscaled and unclassified data', 'FontSize',fontSize)
% First do kmeans with original data.
[classIndexes, classCentroids] = kmeans([x(:), y(:)], 2)
subplot(2, 2, 2);
plot(x(classIndexes == 1), y(classIndexes == 1), 'r.', 'MarkerSize', markerSize);
hold on;
plot(x(classIndexes == 2), y(classIndexes == 2), 'b.', 'MarkerSize', markerSize);
grid on;
% Plot centroids as magenta crosshairs
hold on
plot(classCentroids(1,1), classCentroids(1,2), 'm+', 'MarkerSize', 150, 'LineWidth', 3)
plot(classCentroids(2,1), classCentroids(2,2), 'm+', 'MarkerSize', 100, 'LineWidth', 3)
caption = sprintf('2 clusters and their centroids\nas determined from original (NOT normalized) data')
title(caption, 'FontSize',fontSize)
% Now kmeans with normalized data
[Nx, Cx, Sx] = normalize(x)
[Ny, Cy, Sy] = normalize(y)
[classIndexesN, classCentroidsN] = kmeans([Nx(:), Ny(:)], 2);
% Plot normalized data in lower left plot.
subplot(2, 2, 3);
plot(Nx(classIndexesN == 1), Ny(classIndexesN == 1), 'r.', 'MarkerSize', markerSize);
hold on;
plot(Nx(classIndexesN == 2), Ny(classIndexesN == 2), 'b.', 'MarkerSize', markerSize);
grid on;
% Plot centroids as magenta crosshairs over normalized data.
hold on
x1N = classCentroidsN(1,1);
y1N = classCentroidsN(1,2);
x2N = classCentroidsN(2,1);
y2N = classCentroidsN(2,2);
plot(x1N, y1N, 'm+', 'MarkerSize', 150, 'LineWidth', 3)
plot(x2N, y2N, 'm+', 'MarkerSize', 100, 'LineWidth', 3)
caption = sprintf('2 clusters and their centroids\nas determined from normalized data')
title(caption, 'FontSize',fontSize)
% Now unnormalize the location of the classCentroids
x1 = x1N * Sx + Cx
y1 = y1N * Sy + Cy
x2 = x2N * Sx + Cx
y2 = y2N * Sy + Cy
% Plot original data with class colors.
subplot(2, 2, 4);
plot(x(classIndexesN == 1), y(classIndexesN == 1), 'r.', 'MarkerSize', markerSize); % Plot class 1 in red
grid on;
hold on;
plot(x(classIndexesN == 2), y(classIndexesN == 2), 'b.', 'MarkerSize', markerSize); % Plot class 2 in blue
plot(x1, y1, 'm+', 'MarkerSize', 100, 'LineWidth', 3)
plot(x2, y2, 'm+', 'MarkerSize', 100, 'LineWidth', 3)
caption = sprintf('2 clusters and their centroids\nas determined from normalized data')
title(caption, 'FontSize',fontSize)
Here I show you how to normalize the data, do kmenas, and then unnormalize the centroid to the original, unnormalized location. But you need to realize that what class a point is assigned by kmeans depends on whether the data was normalized or not. Look at the data in the upper left and lower left -- which dots are in each class changes. The classes it decided to assign are different in the two cases, and thus the centroids are in different locations. In a case like this where the x axis values are so huge compared to the y values, it's probably best to normalize the data first, and then unnormalize if you want the centroids back in the original scale space.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!