working with kolmogrov test

Question

hamidreza hamidi on 13 Nov 2018

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/429553-working-with-kolmogrov-test

Commented: hamidreza hamidi on 14 Nov 2018

Hi, I am trying to use kolmogorov test which I' going to use it in my artickle , I generate a data set A then I randomly made a sample set from A. then I wanated to compare these two sample sets with kstest. but It showed me they don't have same distribution.

here is my simple code:

clc
clear all
close all
n_s = 1000;
mother_random_variable = lognrnd(0.3,0.5,[1,100000]);               %data lognormal
S = mother_random_variable(randi(numel(mother_random_variable),1,n_s))          %sample
S_y = [S]';                             %selected data 
S_mean=mean(S_y);               %mean sample
S_var=std(S_y);                 %variance sammple
test_cdf = [S_y,cdf('Lognormal',S_y,S_var,S_mean)];        %make cdf 
kstest(S_y,'CDF',test_cdf)                  %ktest
plot(sort(S_y),logncdf(sort(S_y)),'r--')
hold on
cdfplot(S_y)

they have same distribution and ITs srange result . I found more strage result when I compare my data set with itself, Its result shows me they don't have same distribution.

clc
clear all
close all
n_s = 1000;
mother_random_variable = lognrnd(0.3,0.5,[1,100000]); %data
S=mother_random_variable; % I named data with S for simpler code
S_y = [S]';     %selected data 
S_mean=mean(S_y);
S_var=std(S_y);
test_cdf = [S_y,cdf('Lognormal',S_y,S_var,S_mean)];
kstest(S_y,'CDF',test_cdf)
plot(sort(S_y),logncdf(sort(S_y)),'r--')
hold on
cdfplot(S_y)

DO you have any Idea. tanks

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Adam Danz on 13 Nov 2018

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/429553-working-with-kolmogrov-test#answer_346732

Edited: Adam Danz on 13 Nov 2018

Open in MATLAB Online

Having only looked at your 2nd block of code, I have some comments and suggestions.

1) The parameters for a lognormal distribution are mean and standard deviation in that order. In your code, you're entering them in reverse when you call the cdf() function and this is creating a totally different distribution than you intend to do.

y = cdf('Lognormal', S_y, S_var, S_mean);    % your code, incorrect
y = cdf('Lognormal', S_y, S_mean, S_var);    % correct

2) This is just a suggestion but it's a bit cleaner to use the makedist() function rather than entering the parameters manually into cdf().

doc cdf
pd = makedist('Lognormal', 'mu', S_mean, 'sigma', S_var); 
y = cdf(pd, S_y);   % instead of cdf('Lognormal', S_y, S_mean, S_var)                  

3) " when I compare my data set with itself, Its result shows me they don't have same distribution." But you aren't comparing your data with itself. You're comparing your data with the results of the cumulative distribution function of your data. The plot below shows the distribution of values from your data (top) and the distribution of values from the CDF. Clearly those distributions differ and the kstest() correctly rejects the null hypothesis.

figure
subplot(2,1,1)
histogram(S_y)
title('mother random variable')
subplot(2,1,2)
histogram(cdf('Lognormal', S_y, S_mean, S_var))
title('CDF distribution')

4) This may be irrelevant given the points above but you are using different means and standard deviations to create the "mother_random_variable" and the cdf() data. For the random variables you are using (0.3, 0.5) for the mean and std but for the cdf you're using the mean and std of the data which are ~(1.5, 0.8).

3 Comments
Show 1 older commentHide 1 older comment

Adam Danz on 14 Nov 2018

Open in MATLAB Online

" I wanted to use kstest lognormal distribution with itself. then I want to write a code to compare part of my generated data with whole data and find that they have the same distribution."

If I'm understanding you correctly, you want to create a log-normal set of data; then you want to take a random subsample of that dataset. Then you want to use the kstest to determine if these two sets of data come from the same distribution. I suppose this is a sanity check since it's obvious that the two data set are (literally) from the same distribution.

Here's how:

1) Create your data set.

n_s = 1000;
mother_random_variable = lognrnd(0.3,0.5,[1,n_s]);        

2) Create the sub-sampled data set.

m_s = 400;          %number of random samples from your data
child_random_variable = datasample(mother_random_variable, m_s, 'Replace', false);

3) Use kstest2 (documentation) to determine if those two vectors of data come from the same distribution.

[h, p] = kstest2(mother_random_variable, child_random_variable);

The null hypothesis is that the two inputs are from the same distribution so if h=0, that confirms the null hypothesis.

hamidreza hamidi on 14 Nov 2018

you helped me well . tanks

Sign in to comment.

working with kolmogrov test

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

working with kolmogrov test

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment