Normal Distributions: Standard Deviation from Statistics Toolbox vs Standard Deviation computed using std(data)?

3 views (last 30 days)
I have a set of data that i want to fit a normal distribution to and I calculated mean and std using
[muhat,sigmahat]=normfit(y);
where y is my data set.
However when I use histfit() to plot the normal distribution over my histogram of the data, while the mean is consistent with what muhat value in the statistics toolbox comes out to be (which is simply mean(y)), the std value on the statistics toolbox is different from what std(y) is and clearly varies with what x is in the function yfit=normpdf(x,muhat,sigmahat).
When I looked at what histfit does, it says
x =(-3*sigmahat*muhat:0.1*sigmahat:3*sigmahat*muhat),
so if I say change the 3 to a 4, it increases my std value in the statistics toolbox and vice versa. I have two questions:
  1. Why does histfit use that specific range for x to overplot the normal distribution? I mean changing the 3 to a 4 does not exactly change the appearance of the red line that much other than increasing the tail...
  2. what exactly is std in the statistics toolbox spewing out? Why isn't it the same as std(y)? Which value is more reliable? I mean std should not depend on my range of data but rather the actual data set y.

Answers (4)

Wayne King
Wayne King on 7 Aug 2012
I'm not sure exactly what you're saying here. histfit() calls fitdist to fit the normal distribution.
mu = 10;
sigma = 2;
x = normrnd(mu,sigma,100,1);
[muhat,sigmahat]=normfit(x);
PD = fitdist(x,'normal');
std(x)
I get very consistent results between std(x), sigmahat, and PD.sigma
  1 Comment
Marmi Afrin
Marmi Afrin on 7 Aug 2012
May be cause your data was generated using normrnd. My data are a bunch of measurements, which, by theory, should be normally distributed. Std(y) gives me 0.14, whereas the statistics toolbox seems to calculate std of x =(-3*sigmahat*muhat:0.1*sigmahat:3*sigmahat*muhat), where [muhat,sigmahat]=normfit(y). So sigmahat is consistent with std(y), but the statistics toolbox std comes out to be = std(x) as specified above. But I want the standard deviation of my data, not of the x range generated above for plotting purposes. So does that mean I should not rely on the statistics toolbox for my statistics for my measurement data?

Sign in to comment.


Tom Lane
Tom Lane on 7 Aug 2012
The expression you show for X defines grid of values over which the fitted pdf is to be calculated for plotting. I don't seen the HISTFIT code computing std for this X vector. In fact, I don't see any output from HISTFIT claiming to be the standard deviation. The fit is carried out on the variable DATA before X is created.
As Wayne reports, the fit to DATA seems to match what std gives.
Do you see something different? (I'm not sure what release you are using.)
  1 Comment
Marmi Afrin
Marmi Afrin on 7 Aug 2012
My problem is not so much with histfit but rather the fact that the statistics toolbox on R2011 built gives me std value which is dependent on x use to plot the normal fit as defined above, as opposed to std(data). In other words the std in the statistics toolbox in the figure menu gives me a std which is = std(x) as opposed to std(data), but sigmahat = std(data) (that is my calculated values are fine), just not the same as what the statistics toolbox gives me.

Sign in to comment.


Ilya
Ilya on 7 Aug 2012
As Tom pointed out, histfit in the Statistics Toolbox does not return the value of std nor it produces any "figure menu" with an std value in it. If you are getting an std value from histfit, you are not using the histfit function from the Statistics Toolbox. Ask the person who coded the histfit function you are using why he decided to restrict calculations to +-3sigma.
The histfit function from the Stats Tlbx uses the 3 sigma range to display data and the overlaid curve. Well, you have to choose some range; plotting all the way to infinity won't do much good. The +-3 sigma restriction is for showing the plots only; it got nothing to do with calculations.
  3 Comments
Ilya
Ilya on 7 Aug 2012
That is correct. The std from the "data statistics" menu is based on what is plotted, not on the data used for fitting. Your data are fit using normfit function for the normal distribution. Run that function on your data to get the estimated muhat and sigmahat.
Marmi Afrin
Marmi Afrin on 7 Aug 2012
Yes that's what I decided to depend on. By the way, can you show me how to have the muhat and sigmahat spewed out in a textbox inside the plot? I looked at the text options but the 'string' option let's you put a specific text, but I want to write a general function that will have mu =___ std= _ printed on the respective plot with the respective muhat and sigmahat values computed. Currently I am using the textbox option from the drop down menu, which is a hassle as everytime you have to replot, it's gone.

Sign in to comment.


Ilya
Ilya on 7 Aug 2012
'doc text', 'doc title' and 'doc sprintf'. For example:
x = randn(1000,1);
muhat = normfit(x);
histfit(x)
title(sprintf('muhat = %g',muhat))

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!