Clear Filters
Clear Filters

Questions about performance of singular classifica​tion+regre​ssion CNN versus sequential classifica​tion+regre​ssion CNNs

1 view (last 30 days)
Hi!
I am trying to create a CNN that predicts if there is a ball in an image, and if so it also predicts its coordinates.
For example:
One possible way is to create ground truth labels for a singular regression CNN is:
labels = [1,3,2;0,0,0]
where the first column is binary for a ball or not, second column the x-coordinate, and third column the y-coordinate.
I have some questions regarding how optimized this CNN could be:
  • Can a CNN like this be as effective in both classification ánd coordinate prediction, as two sequential CNNs of which one first predicts if there is a ball, and thereafter the other one predicts the coordinates? I feel like a singular CNN might have higher errors as it has to 'perform' more, and classification is performed via regression instead of using a 'classification' output layer (I know classification is based on logistic regression). The singular CNN probably needs a more complicated architecture, but as long both methods have the same error magnitude I would be fine with that.
Furthermore, I am concered about giving coordinate labels for images without a ball:
  • The ground truth label for the coordinates of images without a ball are [0,0]. Does this introduce a bias for coordinate prediction of actual ball images? Say, that the predicted coordinates of an actual ball image are slightly off towards [0,0]?
  • I could set the ground truth coordinates for images without a ball to unrealistic values, let's say [-1000,-1000]. Does this introduce the same bias? Also, I feel like this might result in wrongly predictions of ball coordinates between -1000 and 0.
Another possible way is to create a CNN architecture with both a classification and regression layer. Is this recommended over the singular regression CNN with the labels proposed above?
Thanks in advance!

Answers (1)

Sandeep
Sandeep on 27 Mar 2023
Hi Kevin Jansen,
To answer your question regarding how optimized CNN could be, a single CNN can be effective in both classification and coordinate prediction, and it is possible to achieve good results with a single CNN. However, as you mentioned, using two sequential CNNs may have some advantages, such as better accuracy and easier interpretation of the output.
Regarding your concern about the ground truth label for images without a ball, using [0,0] as the ground truth coordinate for images without a ball may introduce a bias for coordinate prediction of actual ball images, as the CNN may learn to predict coordinates closer to [0,0] for all images, regardless of whether there is a ball or not. One solution to this is to use a different label, such as [-1,-1], for images without a ball, which is outside the range of the possible coordinates. However, this may not be a perfect solution, as the CNN may still predict coordinates close to [-1,-1] for actual ball images.
Another possible solution is to use a CNN architecture with both a classification and regression layer. This can help to reduce the bias introduced by the ground truth labels for images without a ball, as the classification layer can help the CNN distinguish between images with and without a ball, and the regression layer can predict the coordinates only for the images with a ball. This approach can also help to improve the accuracy of both classification and coordinate prediction tasks.
Please refer the following documentation pages for more information on training CNN for Regression:

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!