How do i fit a histogram properly?

13 views (last 30 days)
Andrea Carobbi
Andrea Carobbi on 12 Mar 2022
Commented: Andrea Carobbi on 14 Mar 2022
I have a vector of data and i need to build an histogram and fit a normal distribution (the data are supposed to be normal). The fit seems good but the chi square test keeps failing.
I tried this way, loading the data in DATA into the variable e
%first fit
fit=e;
media=mean(fit)
sig=std(fit)
w=sig/3;
nbin=round((max(fit)-min(fit))/(w))
% rebin (if th fit is bad, try to remove data outside the 3Sigma)
% clear fit
% fit=e(e>=(media-3*sig) & e<=(media+3*sig));
% media=mean(fit)
% sig=std(fit)
% w=sig/3;
% nbin=round((max(fit)-min(fit))/(w))
figure('Name','both eyes')
histfit(fit, nbin); %make the histogram e fit the gaussian
fitBoth=fitdist(fit,'Normal'); %make the proper fit to get the parameters
%not sure if fitdist uses the nbin provided or how to pass the value
mu=fitBoth.mu; %get the fit parameters
sigma=fitBoth.sigma;
str= ['\mu=' num2str(mu) newline '\sigma=' num2str(sigma)];
annotation('textbox', [0.785773044110552 0.757296497913367 0.108809663250367 0.141321044546851],'String',str,'FitBoxToText','on', 'FontSize', 18,'EdgeColor','red');
[h,p,st]=chi2gof(fit, 'NBins',nbin, 'CDF',fitBoth) %should use the expected value from the fitdist, right?
The results mu and sigma are compatible with a old work in which the data were normal. However the chi2 test keeps refusing the hypotesis.
The code shown is the latest try, i also tried doing it "manually", getting the counts in the bin with histcounts, but i got stuck trying to get the "expected" values from the fit.
Lastly, the mu and sigma from the fit are exactly the same i got from the mean and std functions, which is suspicious, and once again i don't get how such a "good" fit could make the test fail.
Thank you in advance

Answers (1)

Star Strider
Star Strider on 12 Mar 2022
Using chi2gof to assess curve fitting of a regression may not be appropriate.
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/924499/DATA.txt', 'VariableNamingRule','preserve')
T1 = 3852×2 table
Var1 Var2 ______ _____ 2.9383 {' '} 2.835 {' '} 2.8468 {' '} 2.8405 {' '} 2.8718 {' '} 2.844 {' '} 2.8777 {' '} 2.9787 {' '} 3.0433 {' '} 3.107 {' '} 3.1335 {' '} 3.1597 {' '} 3.236 {' '} 3.3902 {' '} 3.5122 {' '} 3.6265 {' '}
Var2_NotEmpty = nnz(~ismember(T1{:,2},{' '}))
Var2_NotEmpty = 0
[h,p,stats] = chi2gof(T1{:,1})
h = 1
p = 8.8933e-25
stats = struct with fields:
chi2stat: 129.2784 df: 7 edges: [1.5065 1.8651 2.2238 2.5824 2.9410 3.2996 3.6583 4.0169 4.3755 4.7341 5.0928] O: [26 39 192 626 986 948 673 234 73 55] E: [12.1793 62.8378 236.6779 580.9630 929.9377 970.9949 661.3803 293.7915 85.0633 18.1742]
This appears to me to confirm that the data are normally distributed.
.
  3 Comments
Star Strider
Star Strider on 12 Mar 2022
I do not understand rejecting the hypothesis that the data are normally distributed. Every other analysis I can think of indicates that assuming the data are normally-distributed is appropriate.
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/924499/DATA.txt', 'VariableNamingRule','preserve')
T1 = 3852×2 table
Var1 Var2 ______ _____ 2.9383 {' '} 2.835 {' '} 2.8468 {' '} 2.8405 {' '} 2.8718 {' '} 2.844 {' '} 2.8777 {' '} 2.9787 {' '} 3.0433 {' '} 3.107 {' '} 3.1335 {' '} 3.1597 {' '} 3.236 {' '} 3.3902 {' '} 3.5122 {' '} 3.6265 {' '}
Var2_NotEmpty = nnz(~ismember(T1{:,2},{' '}))
Var2_NotEmpty = 0
figure
histfit(T1{:,1})
[h,p,stats] = chi2gof(T1{:,1})
h = 1
p = 8.8933e-25
stats = struct with fields:
chi2stat: 129.2784 df: 7 edges: [1.5065 1.8651 2.2238 2.5824 2.9410 3.2996 3.6583 4.0169 4.3755 4.7341 5.0928] O: [26 39 192 626 986 948 673 234 73 55] E: [12.1793 62.8378 236.6779 580.9630 929.9377 970.9949 661.3803 293.7915 85.0633 18.1742]
pd = fitdist(T1{:,1},'Normal')
pd =
NormalDistribution Normal distribution mu = 3.3359 [3.31888, 3.35291] sigma = 0.538644 [0.526879, 0.550949]
figure
probplot(T1{:,1});
.
Andrea Carobbi
Andrea Carobbi on 14 Mar 2022
Well, i refuse the hypotesis because the test said me so. I agree with you, the data are indeed normal, even looking at them, but if the test says they're not, i can't present the results saying they are.
My question is primary if i'm doing something wrong, cause i can't tell in any way how these scrips work, even if i look up their code. I mean, i make the histogram with my binnig, good, but the fitdist function what binning are using to fit ( i know histfit uses fitdist but the how are the parameters i get from fitdist calculated)? From the help i read that without data censoring the function calculate the mean and the sigma and stick it to the data, maybe that explain why my results are so suspiciously good. Can i make fitdist use another method, like max likehood or minimize the chi2? Lastly, i can specify a whole set of parameters for the chi2 function, but not the rule and method the function i want to check with it uses.
I tried asking a friend to do the same fit with ROOT and the results a little better than the ones i get from MatLab, so i'm really starting wondering if i'm doing everything wrong here. If what i did is all right, i can accept the results after all and say the data are not normal and move on, i don't need the data to be normal at any cost (the work that said they are had very less data to wrok with), i need to understand if i'm missing something.
I'm really sorry to ask so much, but i've been stuck on these part of the analysis for three weeks now.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!