How can i equally distribute a list of values into 3 approximately equal groups.

8 views (last 30 days)
I have a list of 75 different values.
I want to divide this list into three lists with approximately equal total values.
So the values are equally distributed into three different groups.
Currently I have no idea how to code this, as I can only divide it into 3 groups without the distribution of the values and I am not the best programmer yet.
Anyone has any tips for me?

Answers (1)

John D'Errico
John D'Errico on 23 Dec 2022
DON'T WRITE YOUR OWN CODE to solve problems that are already well known and solved by others who understand the problems and how to implement the various algorithms.
Certainly don't do so, not until you have exhausted the various tools you can find. You have described a basic clustering problem. So the very first thing you should consider is if kmeans can solve your problem. (As I recall, there are a variety of clustering algorithms. But I think kmeans actually includes many of those I recall learning about many years ago.) If kmeans does not solve your problem, then do a search for other clustering algorithms. Anyway, you should do some quick reading no matter what. You might look in these places:
And of course, kmeans.
help kmeans
KMEANS K-means clustering. IDX = KMEANS(X, K) partitions the points in the N-by-P data matrix X into K clusters. This partition minimizes the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. Rows of X correspond to points, columns correspond to variables. Note: when X is a vector, KMEANS treats it as an N-by-1 data matrix, regardless of its orientation. KMEANS returns an N-by-1 vector IDX containing the cluster indices of each point. By default, KMEANS uses squared Euclidean distances. KMEANS treats NaNs as missing data, and ignores any rows of X that contain NaNs. [IDX, C] = KMEANS(X, K) returns the K cluster centroid locations in the K-by-P matrix C. [IDX, C, SUMD] = KMEANS(X, K) returns the within-cluster sums of point-to-centroid distances in the K-by-1 vector sumD. [IDX, C, SUMD, D] = KMEANS(X, K) returns distances from each point to every centroid in the N-by-K matrix D. [ ... ] = KMEANS(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies optional parameter name/value pairs to control the iterative algorithm used by KMEANS. Parameters are: 'Display' - Level of display output. Choices are 'off' (the default), 'iter', and 'final'. 'Distance' - Distance measure, in P-dimensional space, that KMEANS should minimize with respect to. Choices are: 'sqeuclidean' - Squared Euclidean distance (the default). 'cityblock' - Sum of absolute differences, a.k.a. L1 distance 'cosine' - One minus the cosine of the included angle between points (treated as vectors). 'correlation' - One minus the sample correlation between points (treated as sequences of values). 'hamming' - Percentage of bits that differ (only suitable for binary data). 'EmptyAction' - Action to take if a cluster loses all of its member observations. Choices are: 'singleton' - Create a new cluster consisting of the one observation furthest from its centroid (the default). 'error' - Treat an empty cluster as an error. 'drop' - Remove any clusters that become empty, and set the corresponding values in C and D to NaN. 'MaxIter' - Maximum number of iterations allowed. Default is 100. 'OnlinePhase' - Flag indicating whether KMEANS should perform an "on-line update" phase in addition to a "batch update" phase. The on-line phase can be time consuming for large data sets, but guarantees a solution that is a local minimum of the distance criterion, i.e., a partition of the data where moving any single point to a different cluster increases the total sum of distances. 'off' (the default) or 'on'. 'Options' - Options for the iterative algorithm used to minimize the fitting criterion, as created by STATSET. Choices of STATSET parameters are: 'UseParallel' - If true and if a parpool of the Parallel Computing Toolbox is open, compute in parallel. If the Parallel Computing Toolbox is not installed, or a parpool is not open, computation occurs in serial mode. Default is 'default', meaning serial computation. 'UseSubstreams' - Set to true to compute in parallel in a reproducible fashion. Default is false. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. 'Streams' - These fields specify whether to perform clustering from multiple 'Start' values in parallel, and how to use random numbers when generating the starting points. For information on these fields see PARALLELSTATS. NOTE: If 'UseParallel' is TRUE and 'UseSubstreams' is FALSE, then the length of 'Streams' must equal the number of workers used by KMEANS. If a parallel pool is already open, this will be the size of the parallel pool. If a parallel pool is not already open, then MATLAB may try to open a pool for you (depending on your installation and preferences). To ensure more predictable results, it is best to use the PARPOOL command and explicitly create a parallel pool prior to invoking KMEANS with 'UseParallel' set to TRUE. 'Replicates' - Number of times to repeat the clustering, each with a new set of initial centroids. A positive integer, default is 1. 'Start' - Method used to choose initial cluster centroid positions, sometimes known as "seeds". Choices are: 'plus' - The Default. Select K observations from X according to the k-means++ algorithm: the first cluster center is chosen uniformly at random from X, after which each subsequent cluster center is chosen randomly from the remaining data points with probability proportional to its distance from the point's closest existing cluster center. 'sample' - Select K observations from X at random. 'uniform' - Select K points uniformly at random from the range of X. Not valid for Hamming distance. 'cluster' - Perform preliminary clustering phase on random 10% subsample of X. This preliminary phase is itself initialized using 'sample'. matrix - A K-by-P matrix of starting locations. In this case, you can pass in [] for K, and KMEANS infers K from the first dimension of the matrix. You can also supply a 3D array, implying a value for 'Replicates' from the array's third dimension. Example: X = [randn(20,2)+ones(20,2); randn(20,2)-ones(20,2)]; opts = statset('Display','final'); [cidx, ctrs] = kmeans(X, 2, 'Distance','city', ... 'Replicates',5, 'Options',opts); plot(X(cidx==1,1),X(cidx==1,2),'r.', ... X(cidx==2,1),X(cidx==2,2),'b.', ctrs(:,1),ctrs(:,2),'kx'); See also LINKAGE, CLUSTERDATA, SILHOUETTE. Documentation for kmeans doc kmeans Other uses of kmeans gpuArray/kmeans tall/kmeans

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!