Using xcorr to cross correlate data that contains "spikes"

6 views (last 30 days)
When using xcorr to cross correlate 2 related data sets, everything works as expected - I see a correlation peak and the lag reported is correct. However, when I use xcorr to cross correlate unrelated data sets where both data sets contain 1 cluster of "spikes", I see a correlation peak and the lag reported is the distance between the 2 spikes.
In the image below, x is a random data series. y is also a random data series. Both x and y have 30 random peaks inserted into the series in sequence. In theory, there should be no correlation between the 2 data sets since they are both very different. However, it can be seen from the 3rd plot that there is a very strong correlation between the 2 data sets. The code used to generate this figure is at the bottom of this post.
I've tried to filter the spikes using a few different mechanisms (rolling rms power ... etc) before performing the xcorr. This has worked in some cases but not all. I feel like I need a different approach to the problem, maybe an alternative to xcorr. I do understand why x and y cross correlate using xcorr. Is there another cross correlation tool that I can use? Note x and y will never be exactly the same, they will only ever be approximately the same but in normal operation, it's not the spikes that should make them correlate.
Any suggestions on how to tell is x and y correlate while also ignoring the "spikes"
x = rand(1, 3000);
x = x - 0.5;
y = rand(1, 3000);
y = y - 0.5;
% insert the impulses into the data
impulse_width = 30;
impulse_max_height = 6;
x_impulse_start = 460;
y_impulse_start = 120;
rand_insert_x = rand(1, impulse_width);
rand_insert_x = (rand_insert_x - 0.5) * 2 * impulse_max_height;
rand_insert_y = rand(1, impulse_width);
rand_insert_y = (rand_insert_y - 0.5) * 2 * impulse_max_height;
x(1,x_impulse_start:x_impulse_start + impulse_width - 1) = rand_insert_x;
y(1,y_impulse_start:y_impulse_start + impulse_width - 1) = rand_insert_y;
subplot(3, 1, 1);
plot(x);
ylim([-impulse_max_height impulse_max_height]);
title('random data series: x');
subplot(3, 1, 2);
plot(y);
ylim([-impulse_max_height impulse_max_height]);
title('random data series: y');
[c, l] = xcorr(x, y);
subplot(3, 1, 3);
plot(l, c);
title('correlation using xcorr');

Answers (2)

Honglei Chen
Honglei Chen on 7 Oct 2015
You may want to use
xcorr(x, y, 'coeff')
instead. You will get a higher peak simply because your inserted random numbers are much larger in scale compared to other numbers there. However, using 'coeff' will give you a quantitative way to see how strong that peak really means in terms of correlation. I did a quick try and the correlation is around 0.2, which means it's fairly uncorrelated.
HTH

Image Analyst
Image Analyst on 7 Oct 2015
If you're asking how to remove the spikes, why don't you threshold the signals at a level of 2, then replace elements that exceed that value with something like the median or interpolated values?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!