how to use testing data to validate kmeans?

6 views (last 30 days)
Hello there,
I have some data in 8 text files. I would like to classify the similar ones into same classes. I am using k-means for now. I would like to have 5 of the files as training and 3 of them for testing. I have used kmeans command to have k classes, however, I do not know how to validate my results. In other words, I do not know how to use my testing data to calculate the error? I would appreciate if somebody help me. Thanks in advance.

Accepted Answer

Image Analyst
Image Analyst on 23 Mar 2014
If you do not know the "ground truth" of your data then there's no way to tell if it's "wrong". The only thing you can do (I think) is to classify your "unknown" data and measure how far off your data are from the means of the classes. For example, let's say you had a cluster of data "class#1" around 30 +/- 5, and you had a second cluster "class#2" at 100+/-20. So you run kmeans with 2 classes and it tells you about those two classes, with the mean at 30 and 100. Now you have a data point in the "non-training" set of data and it has a value of 70. So you can say that the 65 belongs to class#2 and it's 40 from class#1 and 30 from class#2. You can do the same for all other data in your test sets.
  3 Comments
Image Analyst
Image Analyst on 23 Mar 2014
To accurately get the error you have to know the tru e values, don't you? And you don't know those. So all you have is a guess.

Sign in to comment.

More Answers (0)

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!