Hello matlab users,
I'm trying to solve the following problem. Suppose we have a number of point on a 2d plot with X&Y coordinates. Each point has it's own label (e.g. 1 or 2). I want to draw a line, which will separate a region of plot with maximum number of points of certain label (e.g. 2). Here is a sample code for the first part of the problem, which plots points and assigns labels:
X = [0.5588;0.3603;0.3309;0.3603;0.1985]; Y = [1.9234;0.3406;1.1295;0.8748;0.2416]; scatter(X,Y); a = [1;2;2;2;1]; b = num2str(a); c = cellstr(b); %displacement so the text does not overlay the data points dx = 0.01; dy = 0.01; text(X+dx,Y+dy,c);
As I see it the brut force approach would be to write a cycle which would iterate coefficients k and b of linear line equation (y=kx+b) and calculate number of points with different labels to each side of the line trying to maximize number of points with given label (note: the optimum kx+b line should separate points in such a manner so that there are no points with label 1 in region of points with label 2). But I am afraid that this approach will take lots of machine time, since I will be trying different combinations of X&Y and for each X&Y combination I'll have to find it's optimal separating line.
May be someone cane suggest more elegant and less time consuming approach? May be there is an analytical approach to solve this problem (instead of graphical).
I'll appreciate your help.
No products are associated with this question.
You write that "the optimum kx+b line should separate points in such a manner so that there are no points with label 1 in region of points with label 2". Can you guarantee that such a perfect discrimination exists? It seems highly likely that the points will be distributed such that a single straight line is insufficient to completely isolate the label 1 points from the label 2 points.
That said, if you have the Optimization Toolbox, you should be able to code up your algorithm fairly easily. I suspect that given that you only have two variables, your algorithm will converge fairly quickly.
If you are looking at straight lines to divide clusters, then consider using svm.
The natural search space for an algorithm like this involves both slope and intercept. Both are important to get the best answer, but we found in practice that getting "close" to the best was somewhat insensitive to slope but was fairly sensitive to intercept.
Starting from "transvariation" techniques suggested by my co-workers, I found a way to optimize the search so that the calculation of best intercept for any one slope, became linear in the number of points. Latching on to the absolute best slope could still require a fairly fined-grained search, but in practice it was amenable to bracketing and refinement, that if you went more than a little way away from "close" to the peak, that the counts would fall fast enough that managing to dig out a tiny tiny angle of separation would not be enough to counter the grosser-level losses. For our purposes, worrying about angles that narrow implied that we were overtraining.
Unfortunately, the project that this was all implemented for was permanently shut down this week, so the code is no longer available.