Clear Filters
Clear Filters

Data partitioning for Machine learning

2 views (last 30 days)
Akshita Gupta
Akshita Gupta on 30 Mar 2019
Answered: Gagan Agarwal on 30 May 2024
what does the warning that the training set does not contain points from all groups in partitioning the data means ? And how can it be removed.

Answers (1)

Gagan Agarwal
Gagan Agarwal on 30 May 2024
Hi Akshita
The warning that the training set does not contain points from all groups in partitioning the data typically arises in scenarios where you're splitting your dataset into training and testing (or validation) sets and at least one of the splits (training, testing, or validation set) does not contain data points from all the groups or categories that are present in the original dataset.
This situation can lead to several issues, including:
  • Biased Model Training: The model may not learn to generalize well across all groups since it hasn't seen examples from each group during training.
  • Inaccurate Evaluation: The testing or validation set may not accurately represent the performance of the model across all groups if it lacks data from some of them.
The warning can be removed by cosidering the following possibilities and using the following techniques:
  1. Check for Small or Rare Groups: Look for any groups that have very few samples and consider merging them with similar groups or using oversampling techniques to increase their representation.
  2. If you're using stratified splitting, ensure that your stratification strategy accounts for the size and distribution of all groups.
  3. Implement custom logic for splitting the dataset that ensures all groups are represented in each split.
I hope it helps!

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!