Should the mean for the randn function converges to zero with increasing the sample size?

1 view (last 30 days)
Hello,
please try the following simple code that shows the variation for the mean of the randn vector with time:
-----------------------------------------------------
x=randn(1,1e7); % generate 1e7 samples, distribution N(0,1)
n=1:1e7; % indication for the number of samples with time
y=cumsum(x); % computes the accumulative sum
z=y./n; % mean of the vector x with time
figure,plot(z),grid
------------------------------------------------------
Now, please take a look for the vector of means (z), and zoom in to show the variation with time.
Repeat this steps more than one time and each time generate new x.
You will notice that that the mean does not always converge !!!
Ofcourse you can increase the sample size than 1e7 and you will see the same behaviour !!!!!!!!!
Also, the mean always found in one side (not oscillating around zero) with time, either the +ve or the -ve !!!

Answers (2)

Roger Stafford
Roger Stafford on 10 Sep 2013
Edited: Roger Stafford on 10 Sep 2013
The experiment you have conducted is not a statistically valid test for observing the scatter of a sum of gaussian random variables of theoretical mean zero. The reason is that in a single cumulative sum as given by 'cumsum' early variations in the sum are bound to be statistically correlated with later ones.
To get a valid test of the variation of sample mean as a function of the number of samples, you need to observe only final sums of varying lengths - that is, you must not use 'cumsum'. For example you could compute a hundred different sums of a hundred independent variables each and observe the scatter of their means. Then do the same for a hundred sums of ten thousand random variables each. Then do the same for a hundred sums of a million random variables each. Using this procedure you should observe that the scatter of the sample means tends to be less for the longer sums. Theory states that this variation from the theoretical mean should decrease in inverse proportion to the square root of the number of variables added, so in the above procedure the scatter ought to shrink by a factor of roughly ten at each of the proposed three levels.
As Walter has noted, in the case of matlab's double precision numbers, there is a limit to how small this observed variation in sample mean will get which is caused by the accumulation of round-off errors in the sums. This would probably not be noticeable for as few as a million terms added together but for higher numbers of terms it will begin to be observable.
There is also the possibility that in such tests there might be detected deviations from the theoretical because of subtle features of the computer's pseudo-random algorithm as opposed to true randomness. Such tests might place a considerable strain on the validity of the algorithm.

Walter Roberson
Walter Roberson on 10 Sep 2013
You have two problems:
  1. The convergence is theoretical, over infinite time. Any subset of the samples might achieve a sum whose absolute value is up to the number of samples in the subset.
  2. You are encountering round-off problems because cumsum() uses limited precision arithmetic, 53 bits of mantissa. Once the sum has exceeded L for some value L, then if the next value has an absolute value that is between 0 and eps(L) then it would not have any effect on the sum. In finite precision arithmetic, -10 + repmat(10, N, 1)/N for positive integer N, is not the same as -10 + 10 even though algebraically they are the same.
compare the two lines,
t = [cumsum(x(:)), cumsum(fftshift(x(:)))];
plot(t)
same x, but the difference in order of addition makes a difference in the result.

Categories

Find more on Random Number Generation in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!