Help with Tobit regression (for left-censored data)

9 views (last 30 days)
Hi everyone,
I'm wondering if anyone who is savvy at Tobit regression can help me. My question is a very simple one:
I have a set of data, X, which I assume follows a normal distribution.
X ~N (mu, sigma)
However, portion of the X data are below detection limit. I hope to estimate the values for these non-detects based on:
1) data with observed value, 2) % of non-detects, 3) the complete distribution (both non-detects and observed values) follows a normal distribution
Ultimately, I wish to get the probability distribution of X. To do this, I think Tobit regression is a possible solution, but I have been unable to do it with any of the built-in functions of MATLAB (or at least I'm not confident with the answers I get from it).
-----------------------------------------------------------------------------------------------
With that, I tried to make up a hypothetical example which can be cross-checked with a trusted reference. However, I still couldn't make it work and would be appreciative if anyone knows how to troubleshoot it for me. I'm copying my code as below:
%%Sampling from a normal distribution
mu=0.5; sigma=0.8;
%%No.of samples to generate
n=50;
x=sigma + mu.*randn(n,1);
%%Censoring data below this point (treating them as unobserved)
DL=0.01;
%%Tagging samples which are observed (above DL) and unobserved (below DL)
cens_id=(find(x<DL));
obs_id= (find(x>DL));
%%Applying negative log-likelihood to fit the data
nloglik= @(p) -sum(log(normcdf(x(cens_id), p(1),p(2))))...
-sum(log(normpdf(x(obs_id),p(1),p(2))));
[y]=fminsearch(nloglik,[0.5,0.5],optimset('MaxFunEvals',10000,'MaxIter',1000));
% y is [mu_hat, sigma_hat]. If true, mu_hat and sigma_hat should be really close to mu=0.5 and sigma=0.8 using Tobit regression
I'm thinking that the issues lie in the negative log-likelihood function which I defined, possibly to do with the portion where non-detects parameter are defined.
Is there anyone who can help please? Sorry for the long question, but I really hope to be able to use MATLAB for coding my problems and am reluctant to switch to other software for solution.
Thanks in advance!! :)
Sincerely, Keah
  1 Comment
Keah Lim
Keah Lim on 22 Nov 2013
I just found that i generated the random number wrongly, should be:
x=mu + sigma.*randn(n,1);
and not:
x=sigma + mu.*randn(n,1);
but i still can't find obtain reasonable estimate of mu and sigma. they are closer to a truncated normal distribution (which only consider the observed value).
the revised code is as followed:
-----------------------------------------------------------------------------------
%%Sampling from a normal distribution
mu=0.5; sigma=0.8;
%%No.of samples to generate
n=50;
x=mu + sigma.*randn(n,1);
%%Censoring data below this point (treating them as unobserved)
DL=0.01;
%%Tagging samples which are observed (above DL) and unobserved (below DL)
cens_id=(find(x<DL));
obs_id= (find(x>DL));
%%Applying negative log-likelihood to fit the data
nloglik= @(p) -sum(log(normcdf(x(cens_id), p(1),p(2))))...
-sum(log(normpdf(x(obs_id),p(1),p(2))));
[y]=fminsearch(nloglik,[0.5,0.5],optimset('MaxFunEvals',10000,'MaxIter',1000));
% y is [mu_hat, sigma_hat]
% mu_hat and sigma_hat should be really close to mu=0.5 and sigma=0.8 using Tobit regression
Thanks, Keah

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!