How to avoid NaN when comparing data

58 views (last 30 days)
George
George on 25 Sep 2014
Edited: George on 25 Sep 2014
Hello
I have a quite generalized question that has been troubling me. When comparing data such as examples with real, we sometimes have 0 or NaN values. As is the case, I thought to work around it in three ways:
1.I replaced the NaN with a number (e.g. 0)
2.Replace NaN with values interpolated scheme
3. Replace NaN with the next / or previous value
Although when faced with a very big series of real values i noticed that by interpolating or replacing the NaN my results are changed.
How can we compare the two vector for a process , for example their coefficient but with not considering the NaN values and their corresponding simulated value.
p.s. am aware of the available options in matlab such as nanmean which do not taken into account the nan values of a vector, I am wondering though how can you compare two vectors (values) and when you detect a NaN to skip the NaN value and the simulated/other value that is to be compared.
I try to solve it on an example I have done, trying the functionality
% generate a random valuable and insert NaN
A=rand(500,1)
b=randi(500,20,1)
A(b)=NaN
plot(A)
%generate a same size random variable to be compared with A
B=rand(500,1)
% Just visual confirmation of the location of the NaN's in A, by replaing
% them with 0
c=A
c(isnan(c))=0
c(c==0)=-1
plot(c)
% process of comparison
corrcoef(A,B)
The objective is where the NaN are located they are not taken into account and the corresponding B value that was to be compared is also ignored.
Thank you

Accepted Answer

Adam
Adam on 25 Sep 2014
Edited: Adam on 25 Sep 2014
idx = isnan( A );
A(idx) = [];
B(idx) = [];
corrcoef(A,B);
would work. I was playing around with inline if in a function handle to return NaN (which can then be filtered out of the result) if either of the inputs to the function is NaN, but I didn't quite get my syntax right and in the end figured that pre-processing as above is simpler.
Obviously you can put the NaN-less versions in new variables if you don't want to lose the values from B that correspond to NaNs in A.
Maybe the function handle wrapper approach would work better if they are huge arrays where taking copies minus NaNs would present memory issues, but otherwise pre-processing seems easiest.
  2 Comments
Adam
Adam on 25 Sep 2014
Edited: Adam on 25 Sep 2014
You could generalise it slightly by writing a function such as:
function result = nanfunc( func, dataWithNans, otherData )
idx = isnan( dataWithNans );
dataWithNans(idx) = [];
otherData(idx) = [];
result = func( dataWithNans, otherData );
to call as e.g.
res = nanfunc( @corrcoef, A, B );
if you want to do numerous operations on your two arrays. Such a function does embed numerous assumptions though, partially highlighted by my variable naming, that the first data input is expected to be the one with NaNs and the second NaN-free (though you could easily change the function to remove elements corresponding to NaNs in either input). Also there is an assumption that the function handle you pass in is one that takes two numeric input arguments and produces one output.
Personally I don't like the name nanmean, but I named that function to be consistent. To me a function called nanmean suggests something more the opposite of what it actually does!
George
George on 25 Sep 2014
Edited: George on 25 Sep 2014
Yes i was meaning something like that, thank you for the help. I found that the interpolation method of the data will not affect the results (too much) if your samples are including few NaN's but if the NaN are significant then the interpolation will "break down" , so another alternative would be nice for people that use large sample series

Sign in to comment.

More Answers (0)

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!