Normal Distributions: Standard Deviation from Statistics Toolbox vs Standard Deviation computed using std(data)?
3 views (last 30 days)
Show older comments
I have a set of data that i want to fit a normal distribution to and I calculated mean and std using
[muhat,sigmahat]=normfit(y);
where y is my data set.
However when I use histfit() to plot the normal distribution over my histogram of the data, while the mean is consistent with what muhat value in the statistics toolbox comes out to be (which is simply mean(y)), the std value on the statistics toolbox is different from what std(y) is and clearly varies with what x is in the function yfit=normpdf(x,muhat,sigmahat).
When I looked at what histfit does, it says
x =(-3*sigmahat*muhat:0.1*sigmahat:3*sigmahat*muhat),
so if I say change the 3 to a 4, it increases my std value in the statistics toolbox and vice versa. I have two questions:
- Why does histfit use that specific range for x to overplot the normal distribution? I mean changing the 3 to a 4 does not exactly change the appearance of the red line that much other than increasing the tail...
- what exactly is std in the statistics toolbox spewing out? Why isn't it the same as std(y)? Which value is more reliable? I mean std should not depend on my range of data but rather the actual data set y.
0 Comments
Answers (4)
Wayne King
on 7 Aug 2012
I'm not sure exactly what you're saying here. histfit() calls fitdist to fit the normal distribution.
mu = 10;
sigma = 2;
x = normrnd(mu,sigma,100,1);
[muhat,sigmahat]=normfit(x);
PD = fitdist(x,'normal');
std(x)
I get very consistent results between std(x), sigmahat, and PD.sigma
Tom Lane
on 7 Aug 2012
The expression you show for X defines grid of values over which the fitted pdf is to be calculated for plotting. I don't seen the HISTFIT code computing std for this X vector. In fact, I don't see any output from HISTFIT claiming to be the standard deviation. The fit is carried out on the variable DATA before X is created.
As Wayne reports, the fit to DATA seems to match what std gives.
Do you see something different? (I'm not sure what release you are using.)
Ilya
on 7 Aug 2012
As Tom pointed out, histfit in the Statistics Toolbox does not return the value of std nor it produces any "figure menu" with an std value in it. If you are getting an std value from histfit, you are not using the histfit function from the Statistics Toolbox. Ask the person who coded the histfit function you are using why he decided to restrict calculations to +-3sigma.
The histfit function from the Stats Tlbx uses the 3 sigma range to display data and the overlaid curve. Well, you have to choose some range; plotting all the way to infinity won't do much good. The +-3 sigma restriction is for showing the plots only; it got nothing to do with calculations.
3 Comments
Ilya
on 7 Aug 2012
That is correct. The std from the "data statistics" menu is based on what is plotted, not on the data used for fitting. Your data are fit using normfit function for the normal distribution. Run that function on your data to get the estimated muhat and sigmahat.
Ilya
on 7 Aug 2012
'doc text', 'doc title' and 'doc sprintf'. For example:
x = randn(1000,1);
muhat = normfit(x);
histfit(x)
title(sprintf('muhat = %g',muhat))
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!