How can i equally distribute a list of values into 3 approximately equal groups.

Question

Aaron Jen Hong Li on 23 Dec 2022

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1883627-how-can-i-equally-distribute-a-list-of-values-into-3-approximately-equal-groups

Answered: John D'Errico on 23 Dec 2022

I have a list of 75 different values.

I want to divide this list into three lists with approximately equal total values.

So the values are equally distributed into three different groups.

Currently I have no idea how to code this, as I can only divide it into 3 groups without the distribution of the values and I am not the best programmer yet.

Anyone has any tips for me?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 23 Dec 2022

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1883627-how-can-i-equally-distribute-a-list-of-values-into-3-approximately-equal-groups#answer_1134927

Open in MATLAB Online

DON'T WRITE YOUR OWN CODE to solve problems that are already well known and solved by others who understand the problems and how to implement the various algorithms.

Certainly don't do so, not until you have exhausted the various tools you can find. You have described a basic clustering problem. So the very first thing you should consider is if kmeans can solve your problem. (As I recall, there are a variety of clustering algorithms. But I think kmeans actually includes many of those I recall learning about many years ago.) If kmeans does not solve your problem, then do a search for other clustering algorithms. Anyway, you should do some quick reading no matter what. You might look in these places:

https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68

https://www.freecodecamp.org/news/8-clustering-algorithms-in-machine-learning-that-all-data-scientists-should-know/

And of course, kmeans.

help kmeans
 KMEANS K-means clustering.
    IDX = KMEANS(X, K) partitions the points in the N-by-P data matrix X
    into K clusters.  This partition minimizes the sum, over all clusters, of
    the within-cluster sums of point-to-cluster-centroid distances.  Rows of X
    correspond to points, columns correspond to variables.  Note: when X is a
    vector, KMEANS treats it as an N-by-1 data matrix, regardless of its
    orientation.  KMEANS returns an N-by-1 vector IDX containing the cluster
    indices of each point.  By default, KMEANS uses squared Euclidean
    distances.
 
    KMEANS treats NaNs as missing data, and ignores any rows of X that
    contain NaNs.
 
    [IDX, C] = KMEANS(X, K) returns the K cluster centroid locations in
    the K-by-P matrix C.
 
    [IDX, C, SUMD] = KMEANS(X, K) returns the within-cluster sums of
    point-to-centroid distances in the K-by-1 vector sumD.
 
    [IDX, C, SUMD, D] = KMEANS(X, K) returns distances from each point
    to every centroid in the N-by-K matrix D.
 
    [ ... ] = KMEANS(..., 'PARAM1',val1, 'PARAM2',val2, ...) specifies
    optional parameter name/value pairs to control the iterative algorithm
    used by KMEANS.  Parameters are:
 
    'Display'  - Level of display output.  Choices are 'off' (the default), 
       'iter', and 'final'. 
 
    'Distance' - Distance measure, in P-dimensional space, that KMEANS
       should minimize with respect to.  Choices are:
           'sqeuclidean'  - Squared Euclidean distance (the default).
           'cityblock'    - Sum of absolute differences, a.k.a. L1 distance
           'cosine'       - One minus the cosine of the included angle
                            between points (treated as vectors).
           'correlation'  - One minus the sample correlation between points
                            (treated as sequences of values).
           'hamming'      - Percentage of bits that differ (only suitable
                            for binary data).
 
    'EmptyAction' - Action to take if a cluster loses all of its member
       observations.  Choices are:
           'singleton' - Create a new cluster consisting of the one
                         observation furthest from its centroid (the
                         default).
           'error'     - Treat an empty cluster as an error.
           'drop'      - Remove any clusters that become empty, and set
                         the corresponding values in C and D to NaN.
 
    'MaxIter'  - Maximum number of iterations allowed.  Default is 100.
 
    'OnlinePhase' - Flag indicating whether KMEANS should perform an "on-line
       update" phase in addition to a "batch update" phase.  The on-line phase
       can be time consuming for large data sets, but guarantees a solution
       that is a local minimum of the distance criterion, i.e., a partition of
       the data where moving any single point to a different cluster increases
       the total sum of distances.  'off' (the default) or 'on'.
 
    'Options' - Options for the iterative algorithm used to minimize the
        fitting criterion, as created by STATSET.  Choices of STATSET
        parameters are:
            'UseParallel' - If true and if a parpool of the Parallel Computing
                       Toolbox is open, compute in parallel. If the
                       Parallel Computing Toolbox is not installed, or a
                       parpool is not open, computation occurs in serial
                       mode. Default is 'default', meaning serial
                       computation.
            'UseSubstreams' - Set to true to compute in parallel in a reproducible 
                       fashion. Default is false. To compute reproducibly,
                       set Streams to a type allowing substreams:
                       'mlfg6331_64' or 'mrg32k3a'.
            'Streams'  - These fields specify whether to perform clustering
                       from multiple 'Start' values in parallel, and how
                       to use random numbers when generating the starting
                       points. For information on these fields see
                       PARALLELSTATS.
                       NOTE: If 'UseParallel' is TRUE and 'UseSubstreams' is FALSE,
                       then the length of 'Streams' must equal the number of workers 
                       used by KMEANS.  If a parallel pool is already open, this 
                       will be the size of the parallel pool.  If a parallel pool 
                       is not already open, then MATLAB may try to open a pool for 
                       you (depending on your installation and preferences).
                       To ensure more predictable results, it is best to use 
                       the PARPOOL command and explicitly create a parallel pool 
                       prior to invoking KMEANS with 'UseParallel' set to TRUE. 
 
    'Replicates' - Number of times to repeat the clustering, each with a
       new set of initial centroids.  A positive integer, default is 1.
 
    'Start' - Method used to choose initial cluster centroid positions,
       sometimes known as "seeds".  Choices are:
           'plus'    - The Default. Select K observations from X according
                       to the k-means++ algorithm: the first cluster center
                       is chosen uniformly at random from X, after which
                       each subsequent cluster center is chosen randomly
                       from the remaining data points with probability
                       proportional to its distance from the point's
                       closest existing cluster center.
           'sample'  - Select K observations from X at random.
           'uniform' - Select K points uniformly at random from the range
                       of X.  Not valid for Hamming distance.
           'cluster' - Perform preliminary clustering phase on random 10%
                       subsample of X.  This preliminary phase is itself
                       initialized using 'sample'.
            matrix   - A K-by-P matrix of starting locations.  In this case,
                       you can pass in [] for K, and KMEANS infers K from
                       the first dimension of the matrix.  You can also
                       supply a 3D array, implying a value for 'Replicates'
                       from the array's third dimension.
 
    Example:
 
        X = [randn(20,2)+ones(20,2); randn(20,2)-ones(20,2)];
        opts = statset('Display','final');
        [cidx, ctrs] = kmeans(X, 2, 'Distance','city', ...
                              'Replicates',5, 'Options',opts);
        plot(X(cidx==1,1),X(cidx==1,2),'r.', ...
             X(cidx==2,1),X(cidx==2,2),'b.', ctrs(:,1),ctrs(:,2),'kx');
 
    See also LINKAGE, CLUSTERDATA, SILHOUETTE.

    Documentation for kmeans
       doc kmeans

    Other uses of kmeans

       gpuArray/kmeans    tall/kmeans

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How can i equally distribute a list of values into 3 approximately equal groups.

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

How can i equally distribute a list of values into 3 approximately equal groups.

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments