Best way to find all possible linear combinations of n rows from large dataset that meets a target for each column

7 views (last 30 days)
I have a large dataset, A, roughly 10000 rows and 10 columns. Each row represents a different investment portfolio and each column represents the percentage of investment in that sector, such that each row sums up to 1. I have a target percentage for each sector and want to find the all possible linear combinations of n different investment portfolios (rows) that meet it. My knowledge of Matlab is pretty basic so I had thought of using the built-in function combnk to find all the combinations of the rows without repeating, then using it and a loop to calculate one by one if the combinations meet the target, but combnk doesn't work for such large number of elements. I also figured this probably isn't the best way to do this so if anyone has any ideas, please let me know.
  9 Comments
Matt J
Matt J on 28 Sep 2018
Edited: Matt J on 28 Sep 2018
Hi Matt, depending on how you mean this, I don't believe it's true. For 10 column data and a given 10 linearly independent row portfolios, you have a nonsingular 10x10 matrix that will give unique c's.
I had an inequality there, supposing that the OP may have meant the portfolio target as a lower bound, but it looks now that he did not.

Sign in to comment.

Answers (2)

Bruno Luong
Bruno Luong on 28 Sep 2018
Your problem can be formulated as finding (all) feasible solutions of a linear constraints.
A*c >= target
c >= 0
sum(c) = 1
The feasible region is either empty (eg if target if too high) or interior of a polygonal in an afine space R^(n-1) where n is the length of target. The set has therefore continuum cardinal (too many point) to be listed on a computer.
To find ONE solution you can use LINPROG with a dummy cost objective, it will return a feasible point (or not) of the linear constraint.

Matt J
Matt J on 28 Sep 2018
Edited: Matt J on 28 Sep 2018
In your response to David, you have implied that your coefficients c are completely unconstrained, i.e., they are allowed to have negative values. I suspect you don't really mean that, since this is a financial application and I have never heard of investing a negative amount in a portfolio. If I assume that c>=0, then you are looking for all c with n non-zero coefficients satisfying the linear (in)equalities,
A.'*c = pf_target
c>=0
As David has told you, you can't get finite solutions to this unless you restrict yourself to linearly independent combinations of columns of A.' (portfolios). So, I will assume further that you wish to confine your search in this manner.
Now, it is a theorem from linear programming that this subset of solutions are precisely the vertices of the 10000-dimensional polyhedron described by the inequalities above. Therefore, your problem reduces to a vertex enumeration problem. For a 10000-dimensional space, the number of vertices may be finite, but likely still too huge to realistically store in a computer. Nevertheless, you can attempt the enumeration using LCON2VERT (download here). It would look as follows,
solutions = lcon2vert( eye(10),zeroes(10,1) , A.',pf_target);
  4 Comments
Bruno Luong
Bruno Luong on 28 Sep 2018
Edited: Bruno Luong on 28 Sep 2018
I think I miss-understood what you wrote above. You are right, the vertices are subset solution(s) of the constraints
A*c = t
c >= 0
Assume they are {c}_i, i=1,...,p (p can be up to 10^30).
But you never said how all solutions can be get from {c_i}.
The solutions (points belong to feasible set, or polytope), if bounded, are actually all convex combination of {c_i}, meaning
S = { c = sum_(w_i*c_i) : w_i>=0, sum(w_i) = 1 }.
OP wants to enumerate S (and not the relative tiny subset {c_i}), which has continuum cardinal when p>1 (recall, in his case p >= 10^30).
Inexperience people often have very off intuition about combinatorics, and how those thing can grow fast, and they believe computer is infinity fast and has infinity large in storage.
I'll try to steer them away and avoid giving them the impression that they could solve such problem by brute force.
Yan Loh
Yan Loh on 29 Sep 2018
Thanks Matt and Bruno, I now realise my approach of as Bruno puts it as using brute force is not the best way to solve this problem and will rethink it

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!